Big Data brings about numerous privacy & information protection challenges – here we describe how big privacy meets big data.
Big data requires big privacy.
Recurring news of data breaches by retailers, healthcare organizations, and others makes it clear that implementing big data initiatives creates complex privacy issues. Companies and government are collecting more of our personal information than ever before, ranging from health records, to financial information, to consumer behaviour. At the same time, more and more of our personal information is publicly available online. With all of these sources of data that can potentially be exploited, it is easier than ever to link different data sets to discover patterns or sensitive information. To remain compliant with privacy legislation and avoid public backlash, organizations that attempt to implement big data need big privacy. A strong privacy framework, along with effective training, tools, and implementation processes, is mandatory to govern big data initiatives.
A number of privacy issues have been raised in discussions of big data (see Kuner et al. 2012, and Bell et al. 2014 in particular). Organizations are collecting increasing volumes of detailed personal information that is often used and shared for a variety of purposes. Privacy laws, however, mandate that personal information can only be collected, used, disclosed, and retained for the original purpose for which it was collected with individuals’ consent. What is the role of individual consent in a big data environment? Is it possible to combine data from different sources while respecting original purpose? Data should be retained only as long as it is needed for its original purpose; is it possible to enforce this when data is shared? Do individuals have a right to have their information forgotten? What is government’s role in setting standards for big data, and how can organizations ensure compliance with privacy laws that were not specifically written for a big data environment?
Big Privacy Requirements
We suggest some concrete requirements for big data privacy legal compliance:
- Managing Intent. The original purpose for collecting information explained to individuals at the time of collection, as well as the privacy and security standards described to them, must be maintained throughout the information life cycle. In a big data context, data sets can often be linked to other available information to create more complex data sets or alter the original data in often unforeseen ways. Organizations must make sure that all security and privacy requirements that are applied to their original data sets are tracked and maintained throughout the information life cycle, from data collection through use, retention, disclosure and destruction.
- Anonymization of Secondary Use Data. Where data is used for secondary purposes it must be anonymized or tokenized to protect individual privacy. Data that is not properly anonymized prior to external (or in some cases internal) release may result in privacy compromise, as the data may be combined with previously collected and complex data sets including geo-location, image recognition, and behavioural tracking. If data is not completely anonymized, the possibility of linking different data sets must be evaluated, as third parties with access to several data sets may be able to combine information and re-identify otherwise anonymous individuals. Tokenization is another alternative that preserves referential integrity while masking sensitive elements.
- Third Party Evaluation. Organizations must evaluate the privacy and security practices of third parties that request access to data. Even if the data is de-identified, linking data sets from several sources may make it possible to re-identify individuals. Organizations that share individual-level data are obligated to ensure that data recipients’ security and privacy data protections are adequate.
- Legislative Obligations. Organizations need to understand the application of privacy legislation to their context and remain up-to-date with legislative changes. Laws and regulations do not specifically address big data, so laws pertaining to the collection, use, and storage of specific types of personal information (e.g., financial, health, children’s information) must be interpreted in context. One major issue to consider is the use of cloud storage for personal information; British Columbia has recently legislated that personal information held by public bodies cannot be stored with non-Canadian cloud service providers because of jurisdictional differences in privacy legislation.
Big privacy is not simply a list of requirements, however. Protecting privacy in a big data environment demands a comprehensive approach that pervades information management. Our approach to big data privacy identifies specific focus areas for organizations to take into account while developing privacy and security, based on a legislative framework to ensure regulatory compliance.
Big Privacy Priorities
Our priorities for developing big privacy for big data are the following:
- Establish a big privacy taxonomy. Given the legal variations in each jurisdiction, your organization will need to seek privacy legal expertise to discern the meanings of the legal terms used in your jurisdiction. Essential terms to examine are:
- The identity scale: Laws in various jurisdictions generally define “identifying information”. Laws may also define “anonymous” or “pseudonymous” information. For example, the US Health Insurance Portability and Accountability Act (HIPAA) defines the exact 18 fields that should be removed for information to become non-identifying. Other jurisdictions offer a vague definition of “identifying information,” such as, information from which the identity of an individual can be “readily ascertained”. Of course, interpretation of such legal requirements is important. The prelude to any big privacy analysis is to refine the meaning and specific indicators for terms such as
“identifying” or “non-identifying” data, “anonymous,” and “pseudonymous.”
- Permissible disclosures: Laws define permissible information disclosures for purposes including research or system use analysis.
- Permissible purposes: Laws also define appropriate purposes and intent for the collection and retention of data.
- Create a big privacy data flow map. Implementing big privacy starts with defining information use cases and the business process for data flows. This involves mapping the intersections between data types, parties, and purposes for data use; in other words, who will receive data, what data sets they will receive and for what purposes.
- Adopt big data threat risk assessment tools. A key consideration in the integration of any new data feed is that the potential risk for re-identification increases whenever existing data feeds are combined with new data feeds. Threat risk assessment tools used in a big data context need to be capable of measuring the privacy risk resulting from linking different data sets.
- Implement big privacy consent. Individuals have the right to decide how much of their personal information to share with whom and for what purposes. Implementing consent in a big data context means developing the ability to track permitted purposes, disclosures, and thresholds of identification at an individual level.
- Implement big privacy tools or services. Privacy tools and services for big data include anonymization, de-identification, and automated risk calculation.
Meeting privacy standards in a big data context means re-examining legal and policy obligations, risk management practices, and data sharing in light of greater interconnection between data sets, individuals and organizations. Maintaining a high standard of big data privacy is possible, but will require revising the processes, tools, and governance of information management.
Bell, Greg, Rotman, Doron, and VanDenBerg, Mike. 2014. “Navigating Big Data’s privacy and security challenges.” KPMG.
Kuner, Christopher, Cate, Fred H., Millard, Christopher, and Svantesson, Dan Jerker B. 2012. “The challenge of ‘big data’ for data protection.” International Data Privacy Law 2 (2): 47-49. doi: 10.1093/idpl/ips003
British Columbia Freedom of Information and Protection of Privacy Act, RSBC 1996. Chapter 165, S.30.1
About the Author:
Waël Hassan, PhD, is the editor in chief and lead writer of Transigram an online monthly magazine. Transigram explores legislative and regulatory changes, new technologies, and the needs and challenges of data custodians provides insight into the development of our approaches to open data access strategies and models. It provides summaries, analyses, insights, and commentaries on business transformation in the areas of Governance, Risk & Compliance, Project & Portfolio Management, IT Strategy & Operations, and Technological Tool Management.
Please join one Waël Hassan’s LinkedIn groups: