AI & Privacy
Privacy, once simply described as the state of being alone or away from other people, is no longer quite so cut and dried. Today, nearly 60% of the world’s population is connected to the internet. That connectivity leads to an ever-changing definition of exactly what privacy is, how it is protected, and who will regulate the standards. In January 2019 an Experian study revealed that 70% of users were willing to share their personal information if they could see an additional benefit such as convenience as long as it came with a promise of greater online security. Similarly, a survey conducted by the Center for Data Innovation showed that some 58% of Americans are willing to allow third parties to collect at least some sensitive personal data (biometric, medical and/or location data) in return for simply using their apps and/or services.
In 2016, Business Insider estimated that by 2020 there would be more than 34 billion connected devices around the globe (more than 4 per person). Today, similar estimates from companies like IDC, Intel and the United Nations have pushed the number for 2020 to a staggering 200 billion devices (equivalent to ~26 devices per person). What began as one computer per home with an internet connection surged when smart phones hit the market place, and has now increased exponentially with the Internet of Things (IoT). From your smartphone, wearables and video game consoles, all the way to your “smart home” devices controlling your thermostats, door locks and appliances, technology and A.I. is connected to your daily lives in more and more ways than ever before.
As a result, we’re facing a privacy paradox – or catch-22 of sorts.
As mentioned earlier, a number of people are willing to provide their personal information if the service can provide them benefit. Data collected related to someone’s health and well-being goes even a step further.
The collection of healthcare data through institutional and personal digital means has led to new and exciting ways in which diagnosis and therapies can be delivered. As such, there has been increasing pressure on life sciences companies to incorporate these data, in a meaningful way, into the pharmaceutical product life cycle. Whether it means the collection and dissemination of data through drug and clinical trials, focus groups or data collected from HCPs, there is an appetite in the market to expedite the creation of therapies and get them to market quicker. However, the collection of these data also raises the concern of exposure of personally identifiable information (PII).
The culmination of the data collected
AI capabilities – such as machine learning, computer vision, natural language processing, and forecasting and optimization – can unleash the full potential of these collected data to aid in the solution of growing health challenges and dramatically change the way therapies are created and delivered. This necessary evolution will enable life sciences organizations to:
- Ensure drug safety.
- Enable pharmaceutical manufacturers to quickly determine the quality, efficacy and safety of new product candidates.
- Get new therapies to market faster.
- Accelerate clinical trials using real-world data sources.
All of the data…
Several government and regulatory agencies around the globe have been working on ways in which to protect users and their sensitive information. This protection is in the form of ‘what’ can be used, ‘where’ it can be used, sometimes ‘how’ it can be used, and the user’s right to demand it not be used at all. Sensitive information goes beyond PII alone. Personal information can be broken down into four groups:
- Personally Identifiable Information (PII) can directly link to or identify a person. It includes things like social security number, address, phone number, email address, etc.
- Quasi-Identifiers (QI) is a collection of miscellaneous information that may not be useful on its own, but when combined may lead to one’s identity or closely identifiable information. QI includes items like ZIP code, age, gender, etc.
- Sensitive Information is data about a person that must be protected. This category includes thing such as salary, disease or health diagnosis, IP address, live geo-location, religion, ethnic origin, etc.
- Non-sensitive Information includes items that do not fit the previous descriptors. Broadly speaking it could mean the state/province or country in which one is accessing a website or service.
When it comes to data being used for a patients’ records alone, simple agreements made with physicians or healthcare institutions are meant to not only give the patient consent for their procurement of the data, but also entrusting that it will be safe/protected within their systems. These data can often include all four types of information mentioned above.
Sometimes these agreements may give your consent to use your ‘sensitive’ information for study related purposes, without connection to your PII. However, in order to truly provide a valuable dataset for machine learning (ML) models to digest, QI is often required as well.
In May of 2018 when the General Data Protection Regulation (GDPR) took effect in the European Union (EU), it raised concerns worldwide over doing business online and honoring the guidelines set forth by the GDPR, as it pertains to citizens of the EU, should they access a website or connected application from anywhere in the world. The enactment of GDPR as law includes requirements regarding consent handling, breach disclosure, data protection and data erasure. While the GDPR is particularly astringent and well defined, the concept of PII in the U.S. is surprisingly lacking and less descriptive. However, it seems that California is leading the charge to change this with the introduction of the California Consumer Privacy Act (CCPA) which is to take effect on January 1, 2020.
Privacy standards like the GDPR, and those following in its example, have made it explicit that data subjects are to not be subjected to autonomous decision making by AI/ML algorithms. But isn’t that precisely what most AI/ML do? Regardless of the statute defining the privacy laws, AI privacy should include at least the following points:
- The system utilizing the AI must be transparent in its use.
- The AI system must have an infrastructural need for the information it is collecting.
- Consumers (definition differs by statute) must be able to opt out of the system.
- The purpose of the data collected by the AI system must be limited by design.
- Consumers should have the right to challenge predictions made by the AI system.
- The data collected by the AI system must be deleted upon consumer request.
We all feel secure enough relying on the Health Insurance Portability and Accountability Act of 1996 (HIPAA) when it comes to sharing personal medical information with our physicians and their peers, but at the end of 2018, a study from the University of California – Berkley assessed that advances in AI may have rendered these rules obsolete1.
Given the advancements and benefits offered by AI, consumers need to be careful about what they share, who they share it with and what the intended use of that data may be. Past legal protections are often quickly outdated. Despite new laws that are being put in place or are on the near horizon, consumers must be diligent in monitoring the use of their data.
For more information on current regulations mentioned above:
- The EU General Data Protection Regulation (GDPR)
- California Consumer Privacy Act (CCPA)
- Health Insurance Portability and Accountability Act (HIPAA)
- Liangyuan Na, Cong Yang, Chi-Cheng Lo, et al. Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning. JAMA Network Open, 2018; 1 (8): e186040 DOI: 1001/jamanetworkopen.2018.6040