You Are What You Click: On Microtargeting
False Promises of Privacy
Promises of anonymity are misleading and far from absolute. In a famous 2000 study, Latanya Sweeney determined that a voter list could be correlated with medical records at a rate of 87 percent based not on any personal information but on three pieces of demographic data: sex, ZIP code and birth date. This allowed the “anonymized” medical data to be linked to a particular name.
But it is not just those three pieces of data: enough anonymous data of any form allows for a positive identification. In 2006, Netflix offered up a huge, seemingly anonymous data set of the complete video ratings of nearly half a million members. In their 2008 paper “Robust De-anonymization of Large Sparse Datasets,” computer scientists Arvind Narayanan and Vitaly Shmatikov showed that very little knowledge was required to correlate one of the anonymous lists with an Internet Movie Database account: an overlap of even a half-dozen films between Netflix’s list and an IMDb account could suffice to make a highly likely positive match. Because many IMDb accounts use people’s real names and other identifying information, they provide a foothold for obtaining a person’s entire viewing history.
A Netflix user history may seem like a fairly harmless example of “reidentification.” Other data sets that are released “anonymously,” including consumer purchases, website visits, health information and basic demographic information, appear more menacing. When AOL Research released a large data set of Internet searches for 650,000 users of AOL’s search engine in 2006, The New York Times and others were immediately able to identify some of the users by finding personal information in the search queries. AOL admitted its error, but the data remains out there for anyone to view. Notoriously, there was User 927, whose searches included “beauty and the beast disney porn,” “intersexed genitals,” and “oh i like that baby. i put on my robe and wizards hat.”
In his 2010 paper, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization,” law professor Paul Ohm wrote:
Reidentification combines datasets that were meant to be kept apart, and in doing so, gains power through accretion: Every successful reidentification, even one that reveals seemingly nonsensitive data like movie ratings, abets future reidentification. Accretive reidentification makes all of our secrets fundamentally easier to discover and reveal. Our enemies will find it easier to connect us to facts that they can use to blackmail, harass, defame, frame, or discriminate against us. Powerful reidentification will draw every one of us closer to what I call our personal “databases of ruin.”
Ohm’s predictions are dire, yet his underlying point is irrefutable: both the value and the privacy of anonymized data are far higher than we intuitively think, because the data gains value only at large scale. The bits and pieces of this data we contribute, free and without compensation, become parts of large, profit-making machines.
Reidentification is a key aspect of giving value to data. It frequently has its value in the absence of total reidentification—marketers do not need to know your name or your Social Security number to show you ads. But if the value of the data goes up for credit bureaus and others that can determine your true identity, there is a strong incentive for them to do exactly that.
Hence, many privacy policies today give a false sense of security. Parsing the language of these policies is not easy, and because the information’s value changes depending on how much other data it is collated with, such guarantees are at best naïve and at worst disingenuous.
As for subscription services, such as those offered by Apple, Google and Facebook, anonymity doesn’t really exist. These companies already know who you are. Ex–Google CEO Eric Schmidt described the Google+ social network as fundamentally an “identity service” providing a verifiable “strong identity” for its users—one that requires you to use your real name. Needless to say, actions performed while logged in to this identity are far less anonymous than those performed under a pseudonym or when not signed in.
As long as you are signed in with an account from these respective services, there is nothing to prevent all actions taken on their sites from being associated with your account. By default, Google records your entire search history if you are logged in with a Google account. The organization Europe Versus Facebook, founded by law student Max Schrems, has publicized the extent of Facebook’s data collection. With the help of EU laws, he obtained Facebook’s internal record of him, a thousand-page dossier containing more or less everything he had ever done on Facebook: invites, pokes, chats, logins, friendings, unfriendings and so on. The accrual of all possible data—unabashedly personal data—is the industry standard. The restrictions, where they exist, are only on how that data is used.