If you've been following the fallout from last week's NSA surveillance revelations, you may have seen repeated reference to a certain "recent MIT study." "Unique in the Crowd: The Privacy Bounds of Human Mobility," published in Nature's Scientific Reports last year, has been cited by multiple media sources, including this one, as evidence for why -- contra Dianne Feinstein -- your metadata matters. Indeed, re-examined in light of the current headlines, the concerns raised by the study seem quite prescient.
The paper's authors, Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen, and Vincent D. Blondel of MIT and the Universite Catholique de Louvain, examined a dataset of 15 months of anonymous cell-phone data from 1.5 million people in a "small European country." (They're a bit coy about how they obtained the data.)
There were no names, addresses, or phone numbers in the data, yet they argue that "if individual's patterns are unique enough, outside information can be used to link the data back to an individual." In fact, just four points of observation -- time of the call and the nearest cell-phone tower -- were enough to identify 95 percent of individuals in the database.
In other words, if I make four calls from four different places over the course of a 15-month period, my pattern of movement could be identified out of a population two and a half times the size of Washington, D.C. If you were able to cross-reference that with my Twitter feed, say, you'd be able to build a pretty good picture of who I am. The pattern still worked when the researchers "coarsened" their sample by using less specific time observations and lumping multiple cell-phone towers into one. The way we move through the world -- and share data while we're doing it -- is pretty distinctive, even at high altitude.
"We use the analogy of the fingerprint," said de Montjoye in a phone interview today. "In the 1930s, Edmond Locard, one of the first forensic science pioneers, showed that each fingerprint is unique, and you need 12 points to identify it. So here what we did is we took a large-scale database of mobility traces and basically computed the number of points so that 95 percent of people would be unique in the dataset."
Hidalgo says that because phone companies like Verizon need to keep this kind of data for billing and customer service purposes, it seemed inevitable that it would sooner or later be put to questionable use. "It felt quite natural that something like this was taking place, but the scale was certainly surprising," he said.
The authors have an op-ed in the Christian Science Monitor today arguing that consumers should be granted more control over and more information about how much of their data is being stored and for what purpose.
"We also need to have a debate about what is possible and what needs to be done," Montjoye told me. "It's sad that beforehand we did not have an open discussion about what we, as a society, find acceptable or unacceptable to do with this data."
At the very least, he hopes that this will be a wake-up call for the public to realize that it is time for this discussion to happen. Their study "shows that when it comes to rich metadata datasets, there are no clear cut between anonymous and not anonymous data. Achieving anonymity is really hard and might even be algorithmically impossible."