Current information security and privacy classifications are being applied with some difficulty to the environment of Big Data, which in the context of the public sector involves the emergence of large databases and increased data sharing. Such databases are usually classified as high-risk, resulting in costly security safeguards. However, de-identification can drastically lower the actual privacy risk posed by information. Could mapping de-identification to risk classifications allow organizations to invest more efficiently in security safeguards and to take advantage of the opportunities of Big Data?
Threat Risk Assessments (TRA) are commonly required for new Canadian government programs and other public sector initiatives in order to determine whether their information assets are being protected appropriately. Their focus is on security, aiming to examine the potential for harm if information is accessed, released, or used inappropriately; to analyze potential risks to information; and to identify appropriate lifecycle safeguards to protect information.
In 2005, the federal government released the Canadian Information Security and Privacy Classification Policy as a guideline for risk assessments. This system defines four risk levels, based on criteria such as potential threats to public safety, injury to individuals or enterprises, financial loss, and damage to government relationships and reputation. Appropriate safeguards are identified for each risk level. The Ontario Ministry of Government Services has since adopted these classifications as a guide for TRAs within the Ontario Public Service.
Applying these classifications to a broad variety of public sector contexts has led to a couple of significant problems, both related to the phenomenon of Big Data. The federal classification guidelines were clearly designed with a political context in mind: examples given for the various risk levels include cabinet documents, briefings, speeches, and contact information. At the provincial level, these classifications translate with some difficulty to contexts such as healthcare, where information is collected in large volumes and regularly shared between organizations. The first problem is that the large volume of information contained in healthcare databases results in a great potential for harm in the event of a breach, and consequently, such databases usually are classified as high-risk. The safeguards mandated to protect high-risk information are costly, and with the emergence of Big Data, these costs are likely to grow exponentially. The second problem pertains to information sharing: not only is there a possibility that classified information will be shared with parties with inadequate security safeguards, but the sharing of personal information raises a number of more basic privacy issues.
At this point, it is helpful to shift from a focus on security to the broader perspective of privacy. On the one hand, it is possible for information to be protected by adequate security safeguards but to violate privacy law nonetheless. An important issue in the healthcare sector has been that of cascading rights when organizations share personal health information for research purposes. While all of the organizations involved may have effective security practices, the information is often disclosed and used for purposes to which patients did not consent; because the information is then stored in multiple locations, it is often also retained longer than mandated by privacy standards. On the other hand, it is possible to protect privacy without security. An important innovation in the field of privacy has been the development of sophisticated and efficient de-identification processes, which remove identifying details from records containing personal information while preserving the utility of data for research. Properly de-identified information can be shared with only a minimal risk to privacy.
These connections between privacy and security have a couple of implications: first, process matters when it comes to protecting data. Excellent security safeguards will not ensure proper information management if privacy concerns are not integrated into business processes and practices. Second, de-identification can radically change information risk. Calculations of re-identification risk – the probability that an individual could be identified based on their (de-identified) data record – provide an objective measure of privacy risk. When privacy risk is very low, fewer security safeguards are needed. Thus, mapping levels of de-identification to information risk classifications could enable much more efficient and effective investment in information safeguards. An approach that unites privacy and security with regard to risk classification could well be the means to unlock the opportunities offered by Big Data while containing the costs of information security.
Photo Credit: Linkedin – Bernard Marr