Untying the link: Why should retailers change course on linking data?

Recent news about customer credit card data breaches at Target and Home Depot has made retailers take a closer look at their security practices, but privacy has largely been left out of the coverage. Retailers’ collection and linking of detailed customer information for the purpose of sales analytics has created an increased risk of identity theft that needs to be addressed.

Published in Transigram, Sept 2014.

News over the last few months about major customer data breaches at Target and Home Depot have been an eye opener for retailers. Large retailers are asking one of two questions: either, “How can we prevent this from happening again?” or more importantly, “What is the root problem that led to these breaches?”

So how did this happen? The breaches were attacks on the cash registers by computer worms. A computer worm is a standalone malware computer program that replicates itself in order to spread to other computers. Often, it uses a computer network to spread itself, relying on security failures in the target computer to access it. Unlike a computer virus, it does not need to attach itself to an existing program. In the case of the Home Depot breach, the worm mimicked an anti-virus program in order to delay recognition, and in fact was only discovered after several months when banks recognized signs of credit card theft. While developing the worm required a certain level of expertise, it could have been installed simply by covertly plugging a USB key into a cash register for a few minutes.

Data linking and the new cash registers

It is easy to forget that these breaches would have been impossible only a few years ago. When mechanical registers were still used, credit processing was done through the phone or a dedicated line. This configuration was secure, because the linkage between the payment and the sale was handled through a paper receipt.

Digital cash registers were developed for a couple of reasons: to enable digital record-keeping, but also to collect data for analysis of consumer purchasing patterns. Retail success depends on a company’s ability to predict future purchasing in order to guide product design, manufacturing, and delivery timelines. This requires linking multiple data points to understand consumer trends, purchasing patterns and purchasing power, and shifts in competition. One major source of data is detailed customer sales information. For instance, with more sophisticated digital cash registers, retailers are able to link information about an individual’s purchases using different credit cards. Using complex analytical models, retailers could even possibly connect credit cards belonging to members of the same family and gain a broader picture of their purchasing patterns as a family. The underlying principle here is that more connections between data points contribute to better analytics capabilities and more accurate sales forecasts.

The first digital registers were very basic terminals connected to a larger computer in the background. These terminals were built for specific and dedicated tasks, with little or no user interface. In the last few years, cash registers have become very sophisticated, using dedicated or generic computers offering a user interface, touch screens, self checkout, etc. Now that most registers run generic operating systems such as Windows, OSX, Android, or Linux, the potential for hacking using worms or viruses written for these systems has grown exponentially. Since cash registers are used by a large number of operators, are located in open spaces, and are easy to tamper with, they are definitely a weak link in data security.

Linking data without risking identity theft

We are not recommending that retailers scale back the sophistication of their registers, but only that they reduce privacy risk to the same level as older systems. Retailers need records of payment transactions, and they need records of sales. For analytics purposes, they want to connect individuals’ purchases at different times using information from credit cards. Linking sales records to payment transactions is not inherently problematic, but saving customers’ names and credit card numbers together creates a substantial risk of identity theft, as the recent breaches have demonstrated.

Registers need to be designed in such a way that even if a register is hacked or stolen, a perpetrator could not connect credit card numbers or purchase records to individuals’ names.

This means that linking should not be done using customers’ names, but using pseudonymous identifiers that cannot be linked to an individual’s real identity. Researchers have been publishing papers for several years on linkable and unlinkable anonymity, and the technologies to put it in practice are available.

The same strategy we propose for protecting customer information stored in cash registers can be applied to other scenarios along the supply chain in the retail space. Stay tuned.

Copyright © Waël Hassan 2014

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License


  1. Transigram: http://transigram.com, 2014.
  2. Barwise, Mike. “What is an internet worm?” BBC. Retrieved 9 September 2010. http://www.bbc.co.uk/webwise/guides/internet-worms
  3. “Difference between a computer virus and a computer worm”. UCSB ScienceLine. http://scienceline.ucsb.edu/getkey.php?key=52
  4. Photo Credit: http://premium.wpmudev.org/