Big Data and Innovation – Setting the Record Straight: De-Identification Does Work

Properly applied de-identification is an effective tool to protect privacy, but recent criticisms have suggested the opposite. The perpetuation of this myth has the potential to adversely impact health research, innovation and Big Data insights. In order to address the misconceptions surrounding de-identification, the paper examines a select group of articles that are often referenced in support of the myth that de-identified datasets are at risk of re-identifying individuals through linkages with other available data. It examines the ways in which the academic research referenced has been misconstrued and finds that the primary reason for the popularity of these misconceptions is not factual inaccuracies or errors within the literature, but rather a tendency on the part of commentators to overstate/exaggerate the risk of re-identification. While the research does raise important issues concerning the use of proper de-identification techniques, it does not suggest that de-identification should be abandoned. 

It is crucial to dispel this myth lest the capability of de-identifying data be viewed as a barrier to innovation. This is simply not the case. It is indeed possible to strongly de-identify data, thereby achieving a high degree of privacy, while at the same time preserving the required level of data quality necessary for the analysis. Maximizing both privacy and data quality enables a shift away from a zero-sum, either/or paradigm, to an inclusive positive-sum paradigm, a key principle of Privacy by Design.  This doubly-enabling, “win-win” strategy avoids unnecessary trade-offs and allows data analytics to advance in ways never before thought possible.

Leave a Reply

Notify of