The Privacy and Security Risks of Web Scraping Coronavirus Data


11 min read

Opinions expressed by Entrepreneur contributors are their own.

Web scraping to track individuals who have taken the coronavirus immunization vaccine will be a complex and massive undertaking. The need to identify the immunization locations and tasks to wipe out the virus completely exacerbates the situation. A multitude of interconnected government and other state agencies’ databases will be required to track the coronavirus vaccine progress. These databases range from ancient codes used to develop mainframe computers and first-generation applications to modern systems.

Simultaneously, tracking coronavirus vaccine immunization is crucial to ensure the killer virus has been routed. Despite the apparent benefits, it is essential to understand the privacy risks arising from tracking coronavirus vaccine progress.

Understanding the Process

A briefing with senior officials from the U.S. Department of Health and Human Services revealed that millions of vaccine doses are being shipped to hundreds of identified sites as the vaccination process gains momentum. The colossal logistical efforts being used to fast-track the process involve government personnel, including multiple government agencies, private and academic partners, military, and thousands of various vaccination sites.

Failing to track the vaccine movements may ultimately cause shipment delays and lead to some patients missing critical second doses of the life-saving vaccine.

In every step of this process, individuals coming into contact with the vaccine must collect and exchange sensitive personal data. The collected information includes details about the vaccine manufacturing institutions, vaccination sites, and state health departments. However, it all boils down to two primary data types – where all the vials of vaccine are housed (the supply) and who has taken it (the demand).

The federal government’s Operation Warp Speed, a public-private partnership initiated by the government, facilitates and allocates all the COVID-19 doses in the U.S. The states order the vaccines, and shipping companies track the delivery. Other healthcare service providers, like clinics, hospitals, long-term care facilities, and the vaccination sites, track the individuals receiving the vaccines and report to the government. Operation Warp Speed aims to produce and deliver 300 million doses of safe and effective vaccines, with the initial quantities available by the start of 2021.

Russia started offering its coronavirus vaccine to any interested citizen. The country is encouraging every Russian to get the vaccine to slow down infections. The relevant authorities will first cover residents of retirement homes, health workers, and all over-75s. Next, Russia will issue approximately 4 million vaccines by Friday.

In almost every country distributing the vaccine, various systems have been deployed to follow up on patient progress and log adverse reactions, if any.

Put, web scraping for coronavirus vaccine tracers and updates comprise a suite of integrated resources and tools, both new and old. Keeping track of and monitoring all COVID-19 vaccine data sources is a nerve center dubbed Tiberius (adopted from Star Trek, a TV show that inspired Warp Speed’s creation). Tiberius is designed to monitor details regarding the coronavirus vaccine movement. However, authorities and contractors at the local and state levels cannot view patients’ health data since their identifiers are removed.

Related: Once Only for Huge Companies, ‘Web Scraping’ Is Now an Online Arms Race No Internet Marketer Can Avoid

 

Web scraping components involved in tracking coronavirus vaccination progress

Colonel R.J. Mikesh, responsible for overseeing information technology procedures at Operation Warp Speed, informed reporters on December 7, 2020, that at least 100 data systems will be involved in administering the vaccine. They are designed to scrape for data from state levels pertaining to vaccine orders, shipping, delivery, and keeping track of patients’ first and second doses.

Some of the COVID-19 vaccine management systems include McKesson’s software solutions, a healthcare company, inventory management systems outsourced from package carriers like UPS and FedEx, and the operation’s chief distribution partner. Also, CVS and Walgreens will be using their data management systems to coordinate data from different vaccination sites. According to Mikesh, it has taken a herculean effort to ensure the systems are connected, work seamlessly, data-verified, and tested. The main challenge is to ascertain that the systems work well once the vaccine becomes available to more people.

Related: Website Scraping Is an Easy Growth Hack You Should Try

 

Understanding the coronavirus vaccine information flow

Once patients receive their coronavirus vaccine shots, the drug store or clinic responsible for administering the vaccine sends their identifiers and names to a pre-defined government database. States and some selected cities are responsible for managing immunization information systems, commonly referred to as immunization registries.

Every state in the U.S., except for large municipalities like New York, operates its own immunization registry. The systems have been in existence for many years and were developed to help the government track and monitor comprehensive immunization records. They are vital since they enable authorized users like schools to access a person’s records and immunization status, regardless of where the vaccine was administered. The information systems also handle and process supply data. States use them to create records and place new orders with the Center for Disease Control and Prevention (CDC).

Scraping for vaccination data enables healthcare providers to administer vaccines to everyone. The amount of information created every day related to coronavirus vaccine research and immunization is massive, making it almost impossible to develop an understandable timeline. With this in mind, the relevant stakeholders have resorted to using various information systems to scrape essential patient and vaccine data to assist in COVID-19 vaccination programs.

But how does this affect the privacy of patients or organizations whose information can be accessed in any state or country? What are the information privacy implications?

Before we look into the possible privacy implications of web scraping patient and vaccine data, let’s have a look at some services and websites where individuals and organizations can scrape the information.

Related: How Artificial Intelligence Is Helping Fight The COVID-19 Pandemic

 

Various services/systems where you can scrape essential patient and vaccine data

The COVID-19 Data Lake

The COVID-19 Data Lake brings together significant data and information systems covering the vaccine’s supply and demand. It draws patient information from the IZ Data Clearing House once the personal identifiers have been scrubbed.

John Hopkins University

With dozens of vaccines now in clinical trials, John Hopkins University will collect details to understand the accelerated timelines for development, the different types of vaccines, and information about their efficacy and safety. Additionally, the university will track data on vaccination efforts. John Hopkins University and Medicine has created a coronavirus resource center that publishes the critical metrics for understanding vaccination progress by U.S. state, doses administered, and percentage of people fully vaccinated by population.

Tiberius

Tiberius is the nerve center of the whole coronavirus vaccination tracking procedures. It directly draws information from in-transit data on vaccine shipments and the COVID-19 Data Lake. Tiberius software combines logistics data with census information to coordinate the coronavirus vaccine distribution.

CDC and Operation Warp Speed will use Tiberius to calculate the weekly allotments to different jurisdictions by considering target populations, storage capacity, and inventory. At least 600 representatives drawn from 64 jurisdictions, including federal agencies, states, and territories, will have login credentials to Tiberius, while health officials can monitor the shipment of vaccine orders.

IZ Gateway

Suppose an individual gets his first COVID-19 shot in New York. How will healthcare providers ensure that a pharmacy located in Florida gives him the correct second shot? IZ Gateway is a centralized information exchange system for patient data. It is a technology tool that immunization registries use to share patient information. IZ Gateway is hosted by the Association of Public Health Laboratories and transmits patient information to CDC’s IZ Data Clearing House. The Data Clearing House also draws patient information from other sources, such as federal agencies and pharmacies.

VTrckS

VTrckS, short for (Vaccine Tracking System), is a CDC ordering and management technology. The tracking system has been in use since 2006 with McKesson healthcare company and has been involved in distributing more than 150 million vaccination doses for diseases like flu, chickenpox, and MMR. It uses information sent through the state Immunization Information Systems to fulfill vaccine orders. Once it places the orders, CDC contacts McKesson with the order details, which then ships it and sends the information to the state.

Vaccine Administration Management System

CDC plans to roll out a new web-based system called Vaccine Administration Management System (VAMS) to track the coronavirus vaccine. The system is designed to monitor and track vaccines from when they reach designated vaccination sites to when patients receive them. VAMS can assist cities or counties in establishing vaccination sites and pooling together essential data. It can also permit organizations to track vaccine doses scheduled for their staff members. Patients can use it to track their appointments and receive reminders once the second dose is due. On the other hand, vaccine providers can use VAMS to organize vaccines according to manufacturers, schedule patients, and check inventory.

What are the Possible Privacy Risks?

The methods and information systems deployed to facilitate and monitor coronavirus vaccine rollout and immunization may hold privacy risks. Integrating information handled by pharmacies, clinics, and other vaccination centers with the state immunization databases provides malicious actors with the opportunity to misuse patient data.

The Department of Defense collaborates with private organizations and states to permit web scraping and data sharing using the immunization databases to enable the vaccine’s distribution. Such systems will allow individuals to receive their first dose in one location and take the second shot in a different state.

Tracking which vaccine was administered to which patient will assist healthcare providers in ensuring the individual takes the correct second dose.

However, what happens to the data?

The CDC released details regarding how the federal government will track side effects after the first dose. One of the ways is through daily texts asking inoculated people to describe their side effects. The method raises some red flags since agencies lack clear outlines of the safeguards used to protect the data. One of the privacy risks involved is companies using immunization information for unauthorized commercial reasons.

Also, states have different data protection rules and regulations. The differences in terms of which data points to secure leaves patient data exposed to unauthorized access, web scraping, and use. For instance, a state like Texas has laws governing the use of personal data in marketing, but other states may not have the same level of protection.

There are also certain types of data bound by the Health Insurance Portability and Accountability Act (HIPAA). The wordings of some of the vaccine data contracts may prevent the law from applying to all parties accessing and utilizing the databases in a given instance. Subsequently, suppose the federal health data privacy laws cannot cover all information and that contracts fail to include stipulations on how companies can use patient data. In that case, there is a high likelihood that some patients will have better privacy protection than others.

Besides, what would happen if a breach were to occur? For instance, healthcare personnel like pharmacists or doctors may enter a patient’s vaccine information in a database, and a third party could scrape it or a hacker access it illegally. If a breach occurs when the data is under the government’s control, the government becomes responsible. If information is out of the control of healthcare providers, it may not be protected by HIPAA. The lack of a straightforward arrangement regarding handling data breaches and mutual information protection policies exposes patient data to serious cybersecurity risks and illegal web scraping activities.

Legal web scraping for legal purposes

Scraping data from the listed services providing vaccine updates have some ethical, legal, and technical limitations. Web scraping is legal for legal purposes and when it complies with regulations like the 2018 General Data Protection Regulation (GDPR).

  • Ensure that the purpose of web scraping is legal: Identify the information to be collected, data sources, and format. Ensure that the scraped information does not cause any financial or reputational damage to the data owners.
  • Get publicly available information: Some of the COVID-19 vaccine services and websites publish data for public consumption. Even if the data is legal for copying, it is better to double-check the website’s policies and terms of service (ToS). Make sure that the information on the sites does not contain personal data.
  • Check copyrights: In addition to ToS and policies, websites provide copyright details that web scrapers should respect. Before scraping vaccine information, make sure that the service has not copyrighted the data.
  • Identify your web scraper: Be respectful and identify your web scraper with a legitimate user agent string. You can create a page explaining your activities and the reason for scraping data from the sites. If you deploy bots, ensure they abide by a site’s robot.txt file that details the pages bots can access.

loading…

Scroll to Top