by Barbara J. Mack and Elizabeth Bruce
Harvard recently held its annual symposium on the future of computation in science and engineering in Jan 2015, focusing on “Privacy in a Networked World.” Several members of our MIT Big Data Privacy working group attended and in this post we summarize a few of the highlights.
What is privacy?
Salil Vadhan, Director of the Center for Research on Computation and Society at Harvard, set the stage, highlighting what we perceive to be aspects of privacy: the right or freedom not to be watched, to use and disclose our own data as we see fit, to maintain our anonymity, and general non-interference in our private affairs by businesses, governments, and other entities. He noted the classic tensions between these goals and the risks that often require a level of compromise: individual and national security; disease and epidemics; prevention and prosecution of crimes requiring law enforcement capabilities and legal channels. Vadhan asked us to reflect on the current state of technology and what is new or different in this big data environment. What mechanisms are available to assist with goals on both sides and what needs more work, in the courts or legislation, or in personal perspectives and awareness?
Edward Snowden and Bruce Schneier
The day kicked off with a conversation between Edward Snowden, former system administrator of the National Security Agency (NSA), who joined the conference via video, and Bruce Schneier, a Fellow at the Berkman Center [WATCH VIDEO]. Speaking to a full house, Schneier and Snowden engaged in a discussion of cryptography, from the strengths and weaknesses of algorithms and their implementation to the activities of hackers, whether they are independent, loosely organized, or backed by a national government or other complex and well-funded groups. Snowden discussed the significant developments in cryptography over the past few decades, but also noted the wide range in levels of sophistication in communities around the world.
Schneier and Snowden gave opinions on how things are changing as the Internet matures, such as the fact that many of the technologies we utilize now, like TCPIP, Microsoft’s Skype, Cisco’s routers, Google’s Gmail and others, are technologies used by people around the globe. “We all use the same stuff, “ commented Bruce. “Which means that whenever you develop techniques to attack their stuff, you are also developing techniques that leave our stuff vulnerable.” In terms of government surveillance, Snowden commented that we are seeing NSA shift from a “defensive” organization (i.e. passive listening) to much more of an “offensive” player where attack tools and computer hacks are common.
Schneier and Snowden shared thoughts on targets, the expenditure of resources, and the value of information obtained. Essentially, an attack extracts a certain amount of valuable information, Snowden pointed out, but the value of that information to the target and in the marketplace varies wildly. He cited a story from Der Spiegel [cited in a number of English language news sources including CIO Magazine] that covered the repeated compromise of North Korean networks by foreign governments, including the NSA, and yet these attacks appear to have missed signs of impending missile launches, nuclear tests, and changes in leadership. In contrast, a single successful attack on Sony Corporation resulted in vast economic damages for the firm.
In summarizing his comments, Snowden referred to the Silk Road case and noted that the mastermind behind Silk Road used full encryption, yet in the end they were reading his diary out loud in court. Silk Road illustrates that even when you are using the Dark Net and sophisticated encryption techniques, you cannot stay anonymous on the Internet. “Encryption is not fool proof,” said Snowden, “the end points will always be a weakness.”
In closing, Schneier mentioned his new book coming out March 2015, “Data and Goliath”
NSA did not engage in unlawful activity
In an interesting counterpoint, the next speaker was John DeLong, Director of the Commercial Solutions Center at the NSA. John said he would not pursue an Oxford-style debate in response to some of Snowden’s comments, but instead offered his perspective on the NSA and its mission. DeLong referred to a speech by Professor Geoffrey R. Stone, University of Chicago Professor of Law, who served on the President’s Review Group on NSA surveillance and related issues (in Fall 2013) in which he states, “The Review Group found no evidence that the NSA had knowingly or intentionally engaged in unlawful or unauthorized activity.” [What I told the NSA]
DeLong emphasized the importance of public discussion and the work of formal governmental boards and panels, asserting that there should be more groups of mathematicians, computer scientists, engineers, lawyers, and policy people involved in key debates and initiatives from the beginning. DeLong referred to the tensions between privacy, safety, and security, noting that, “law is the math and science of human interactions”; it is a set of rules, in essence, a grouping of functional requirements for modern society. Quoting Einstein, DeLong said, “Politics is more difficult than physics,” and commented that it is important to break out of self-reinforcing circles, to meet and talk to new people. Perhaps privacy is the ultimate applied science, he suggested.
Health data: today’s informed consent process keeps valuable data in silos
John Wilbanks, Chief Commons Officer of Sage Bionetworks, offered comments on “Privacy and Irony in Digital Health Data.” His firm is seeking ways, as a non-profit organization, to assemble medical research studies across very broad populations and provide greater access to the data, to the patients themselves and to other researchers. Wilbanks noted that a recent study on mood conducted by Facebook had 689,000 participants. A study on people with Parkinson’s disease had 1,800. “It is ironic,” he said, “that privacy laws affect how we look at health data,” in short: today’s informed consent process puts valuable medical data into silos. Particularly as many patients may not really understand what is at stake. In order to protect the data, its use is highly restricted and may never be available again for the purposes of other studies, impeding progress in the search for cures to major debilitating diseases, like cancer and Parkinson’s disease.
The team at Sage is focused on protecting data donor privacy and is developing user interfaces that help explain to potential participants how the data will be used and what the implications are, both in terms of the research and their personal privacy. He explained that each researcher is required to submit a short video, in which the researcher commits to upholding an ethics code and informs patients about the research, so the transparency on personnel is far greater than normal to the patient. The team will be evaluating how this model of open studies affects the research protocols and are also analyzing what happens to predictive models and regulatory decision-making processes in an environment of dynamic consent, where people may opt in and opt out over time, altering the nature of the data set, even when historical data will be preserved.
Cynthia Dwork outlined “The Mete and Measure of Privacy,” based on her research as a Distinguished Scientist at Microsoft Research. Dwork discussed her work on differential privacy (DP), noting the tradeoffs between the protection of privacy and the preservation of statistical utility in the data. Citing a rich array of prior research, she explained that by gaining a better understanding of the loss of privacy and the possible harms involved, it will be possible to develop approaches that make use of a “privacy budget”; one challenge now is to determine the meaning of the measure of privacy loss (in DP terminology, the epsilon factor), and to develop ways to set it appropriately. Cynthia also addressed the data privacy issues in the context of advertising, noting the role of emotional states in purchasing decisions and questioning whether the advertising industry of the future may cross boundaries into mood manipulation in order to target personalized ads and increase sales based on individual’s personal data.
Tech industry moving to encrypt user data
Betsy Masiello, Senior Manager of Global Public Policy at Google, described the challenges of “Protecting Privacy in an Uncertain World.” Masiello presented an impressive series of data points and anecdotes on privacy issues and highlighted the steps that Google and other technology companies are taking to increase protections for individual users. Masiello said Google is moving to encrypt all of its services from Gmail to search, saying that encryption is still the best way to go to protect consumers from identity fraud. She mentioned a new privacy tool in development at Google, RAPPOR, “a novel privacy technology that allows inferring statistics about populations while preserving the privacy of individual users.” [github: https://github.com/google/rappor] an open source project using “practical” differential privacy.
Masiello also referred to a recent working paper by Chander and Le, “Breaking the Web: Data Localization vs. the Global Internet”, saying that “data localization,” the move toward allowing customers to store data locally and not across multiple data centers in different countries around the globe (driven by idea that different legal regulations apply depending on where the data resides), will ultimately harm security and privacy. Google has spoken out against “data localization,” suggesting that we need global government surveillance reform instead.
Public opinion on privacy
Lee Rainie, Director of Internet, Science and Technology Research at the Pew Research Center, provided a view on the current state of public opinion research in the privacy domain, with a clever play on the term “Surveillance,” adding “Sous-veillance” (observing from underneath, or from the grassroots level up), and “Co-veillance” (observing each other). Lee presented a summary of recent findings from study on public perceptions of privacy and security in the post-Snowden era. Here is a quick summary of public perceptions when it comes to privacy:
Privacy is not binary. People understand that it is a function of context.
Personal control is a dominant feature (right to be forgotten, right to access etc.).
Trade-offs are part of the bargain. People understand that they give up some privacy for some benefits.
Younger people are much more focused on networked privacy.
Many people know that they do not know everything about how data being used.
There is a growing sense of hopelessness and trust is fading.
Lee concluded by saying that we, the people and organizations in this room, have a job to do: convince people that they should care about online privacy and that they can affect the future.
The symposium ended with a round table discussion on society’s visions and demands for privacy, moderated by Danny Weitzner, Director of the MIT CSAIL Decentralized Information Group and professor of public policy in MIT’s Computer Science Department. Panelists included Kobbi Nissim (professor of computer science at Ben-Gurion University and visitor at CRCS), Nick Sinai (Venture Partner at Insight Venture Partners, former U.S. Deputy CTO at The White House, and current Fellow at Harvard Kennedy School’s Shorenstein Center for Media, Politics, and Public Policy), and Latanya Sweeney (Director of the Data Privacy Lab at Harvard, Professor of Government and Technology in Residence at Harvard University and former CTO of the Federal Trade Commission). The group addressed the forms of privacy that require protection and considered what aspects of the debates around privacy issues are new in the current technological environment.