Big Data Strategies for Fighting Breaches

Database security and data access do not need to be competing priorities. Big data strategies can help fight data theft, enhance analytics, and make it easier to share anonymized or aggregate data.

Breaches in big data world

Every month or two, we hear about a major corporate or government data breach due to “hacking.” One of the latest involves LastPass, a company that allows people to store their passwords online: encrypted versions of users’ master passwords were stolen. In fact, as in most of these incidents, this is not a case of hacking but of data theft. Data is stolen either by means of password theft, or by rogue employees. The usual response of strengthening security does little to address these risks. How, then, can data theft be prevented?

The best way to prevent data theft is simply to reduce the number of people who have access to sensitive personal data, and the scope of each user’s access. Big data strategies offer promising possibilities here, by making it easier to work with anonymized or aggregate data rather than raw personal data. Big data techniques also offer more sophisticated monitoring capacities, making it easier to link user activity on different devices and detect high-risk access patterns.

I believe that databases designed with big data strategies in mind will have the following characteristics:

1. There will be no system administrator role with full access, but instead a system configuration role.

One of the weak points of role-based access control is that system administrators typically have access to the entire contents of a database, including raw data and user credentials. Because system administrators typically manage all of an organization’s databases, a data thief who manages to steal a system administrator’s passwords will very likely gain access not only to an entire database, but to the organization’s entire data holdings.

It has mostly been taken for granted that database administrators need full access, but is this really the case? Database administrators are usually responsible for developing and designing database strategies, monitoring system use, and improving database performance and capacity. The role also includes coordinating and implementing security safeguards. An updated system configuration role could similarly support the ability to make structural changes to the database, but without providing access to the raw data. In a fully obfuscated-encrypted database environment, data should never be stored in clear text; this opens the possibility of a configuration role that can manage data without viewing it.

The shift to a system configuration role would also involve separating database configuration duties from access management. That is, the system configuration role should not include access to everyone’s access credentials. Access management can be delegated to an external identity system that is not hosted within the database environment. By this separation of duties, a stolen database will be inaccessible because authorization will be administered by third party tool. Such separation will provide stronger defenses against outside attacks.

2. Role-based access will be replaced by use-based access.

As we described in our most recent post, role-based access control focuses on which data users can access, while use-based control focuses on which forms of data users can access. Role-based access control sets limits on which groups of records and which data fields each user can access. For example, medical professionals in a clinic or hospital ideally will have access only to their patients’ records and not others; administrative staff will not have access to full patient records, but only to contact, scheduling and billing information. Use control focuses instead on the level of identifiability of the data that a user can access: individual data records, anonymized data, or aggregate data. For example, in a healthcare environment, only clinicians and frontline administrative staff need access to individual patient records. Research, analysis, and management purposes can generally be fulfilled by anonymized or aggregate data. A database that incorporates use controls can support access to individual records for users who need this data, as well as anonymized or aggregate data for users who do not.

3. Access to data will be based on history of access.

This means that users’ data requests will be checked against their access history in order to detect and block suspicious activity. Anyone who accesses too many records in a given period of time, or accesses data unrelated to their duties, will be blocked from the database. Access from unfamiliar computers or devices can also be blocked, or require additional verification.