An Introduction to Anonymization and the Identification Spectrum
Data anonymization services have become regulatory and marketplace requirements for organizations that use or share individuals’ personal data for secondary purposes, mainly research and analysis. For anonymization to be implemented effectively within an organization, everyone who handles personal data needs to have some understanding of when it is needed and how it works. This is not usually the reality. We offer a brief introduction to the concept of “identifying information” and how anonymization protects identity.
There has never been a more critical time for organizations to anonymize personal data holdings in reliable, effective ways. Data anonymization services have become regulatory and marketplace requirements in Canada and around the world for organizations that use or share individuals’ personal data for secondary purposes, mainly research and analysis. Hospitals that share clinical data for public health and research purposes, banks that use data from client financial records for financial forecasting, and retailers that gather customer data to analyze consumer behaviour are a few of the organizations for whom anonymization is becoming a routine part of operations. The recent order by the Information and Privacy Commissioner of Ontario (IPC) on the linkability of Postal Codes is great example that goes to show that some identity attributes will lead to identity discovery.
However, even in organizations in which anonymization is a standard part of operations, it is not unusual for the majority of staff not to know when anonymization is needed or how it works. Anonymization is often entrusted to technical or business staff with little knowledge or training in the area. Even when the staff performing anonymization has adequate training, management and other staff may continue to make decisions about data management based on insufficient knowledge. Both scenarios reinforce that everyone who handles personal data needs to have a basic understanding of the role of anonymization in protecting privacy.
The concept of anonymization is simple enough: anonymizing a data set means masking or altering data that could identify individuals. It is the techniques used to anonymize data that more often seem difficult to understand. However, understanding anonymization really begins with understanding what data needs to be anonymized, rather than how it is done. In other words, what types of data should be considered identifying information?
Obviously individuals’ names, health card numbers, and so on, constitute identifying information. Yet removing these identifiers does not guarantee anonymity. Combinations of data such as gender, age, significant dates and postal codes can identify individuals, particularly when linked to information available from public sources such as online registries and directories, news websites, and social media. Different types of data cannot be categorized as identifying or non-identifying, but rather fall on an identification spectrum, sometimes referred to by privacy professionals as “nymity.”
The Identification Spectrum
The high end of the identification spectrum is often called verinymity. A verinym is a “true name,” one that asserts an individual’s identity in a certain jursdiction: for example, the name on an individual’s government-issue birth certiﬁcate, driver’s license, or passport.
A verinym can also be any piece of identifying information that is unique to an individual. For example, a driver’s license number is a verinym; so is a healthcare number. In the online world, an email address or an IP address can also be considered a verinym.
Transactions in which a verinym is revealed are said to provide verinymity.
Verinyms have a few important properties:
Permanence: Verinyms are, for the most part, hard to change. Generally, even if you do change a verinym, there is a record of the change, which will link your old name to your new one.
Traceability: Verinyms, such as cell phone GPS locators, IP addresses, and log-in names, make it possible to trace many of an individual’s actions and activities.
Reference Point: Any verinymous transaction that an individual performs links the verinym to other data elements. For example, a credit card transaction connects a credit card number to a name, location, time and purchase. A person with database access can easily locate this information by using a verinym as a search term.
Leakage: Verinymous transactions make it possible, if not easy, to gather considerable amounts of information about an individual by cross-referencing different databases, each of which is indexed by a verinym.
Volume: Verinyms are attached to enormous volumes of data. Online transactions and interactions, in particular, create massive volumes of data that can be retained and analyzed.
These properties are what make identity theft so problematic. If an impostor uses one of your verinyms by giving it a bad credit record, it can be difficult to get the situation resolved, since you can’t change the verinym you use (permanence), and you can’t separate the transactions you made from the ones made by the impostor made (leakage).
At the bottom of the identification spectrum are transactions that reveal no information at all about the identity of the participant. We say that transactions of this type provide unlinkable anonymity: no information about your identity is revealed, and there is no way to tell whether or not you are the same person that performed a given prior transaction.
The most common example in the physical world is paying for groceries with cash. In such a transaction, no information about your identity is recorded.
Between the two extremes of verinyms and completely anonymous data fall many attributes (characteristics) of individuals that may be included in a data set. These attributes could possibly provide a link to a verinym (an individual’s identity), particularly when combined with other information.
Linkable attributes may not identify an individual with absolute certainty, but can narrow the possibilities down to a small number of individuals. For example, a postal code alone could identify a family, since a number of Canadian postal codes contain only one household. Even a larger urban postal code, when combined with attributes such as a racial identifier and the number of children an individual has, could narrow down the possibilities to one or two people.
Linkable attributes include any that are visible when an individual is in public, such as sex, height, age, race, visible disabilities, and pregnancy. Other linkable attributes are date of birth or age, diagnosis, and treating physician. In short, any attribute that provides clues to an individual’s identity is potentially linkable.
Implications for anonymization
Privacy law requires strong protections for personal information, which is defined as information that could readily identify an individual. The goal of anonymization, when identification is understood as a spectrum, is to reduce the amount of linkable information to such a point that a data set no longer poses a significant risk of identifying individuals, and can therefore be shared more freely. Anonymization does not only mean deleting fields such as names and birth dates, but evaluating the probability that any particular data field could be combined with other available information to identify individuals. Effective anonymization is not a technical exercise, but a risk management process requiring specialized training and tools to ensure that individuals’ unique sets of characteristics, not just their names, are concealed.
IPC ON – Order PO-3429 – Ministry of Community Safety and Correctional Services
Viewers are also reading:
Want a thriving business: focus on youth privacy
Risk: From overhead to an investment – how to change approach
Five Key Big Data Privacy and Information Protection Challenges
Senior Editor: Esther Townshend
Copyright: All rights reserved , © Waël Hassan
About the Author:
Waël Hassan, PhD, is the editor in chief and lead writer of Transigram an online monthly magazine. Transigram explores legislative and regulatory changes, new technologies, and the needs and challenges of data custodians. We provide insight into the development of our approaches to open data access strategies and models. Transigram offers summaries, analyses, insights, and commentaries on business transformation in the areas of Governance, Risk & Compliance, Project & Portfolio Management, IT Strategy & Operations, and Technological Tool Management.
Please join one Dr. Waël Hassan’s LinkedIn groups: