You Are What You Click: On Microtargeting
The Information Pipeline
Consumer data is collected, assimilated, processed, resold and exploited by a leviathan encompassing hundreds, if not thousands, of companies playing dozens of roles: ad brokers, data exchanges, ad exchanges, retargeters, delivery systems, trading desks. Some of these companies are well-known: Google, Facebook, Apple, Twitter, Yahoo. Beyond these, there are far more obscure entities trying to collect similar data, but their web presence is minimal. Companies like Acxiom, BlueKai, Next Jump and Turn build up demographic profiles so that advertising can be targeted as precisely as possible.
Ironically, the activities of the big names are more mysterious than those of smaller companies you haven’t heard of. The operations of Facebook, Google and Amazon encompass the end-to-end collection of data: aggregation, usage and targeting. With their own private data sets, they don’t need to engage in transactions with, say, data exchanges. The greatest use to which they can put their data is internal.
If you know who the smaller companies are, it’s somewhat easier to find out what they do than it is to know what Google or Facebook is doing with your data. That said, Facebook and Google may not be doing single-handedly what these other companies do collectively; for many reasons, both business- and image-related, their agendas are different. But any company with a sufficient amount of consumer data can replicate the working model of the advertising ecosystem, and that should be enough to raise alarms. The model, though complex, breaks down into four stages through which consumer information is obtained and put to use: observation, collection, aggregation and targeting.
Whenever you browse the web, you leave a permanent trace of your activity. Every machine and device connected to the Internet has an IP address. The IP address does not identify you, but neither is it wholly anonymous. Blocks of IP addresses are associated with particular Internet service providers (ISPs), and most are geographically specific. This information is public: by visiting a website, you enable the website owner to learn where you are located. An ISP may assign you a different IP address over time, but frequently the address remains the same, so repeat visits can also be tracked.
Cookies are little pieces of data that websites ask your web browser to store. They can contain almost any data, but they’re frequently used for remembering user preferences: the language you speak, for example, or your login and password. They also provide an easy way to tell when the same user is returning to a site. Browsers will send cookies back only to the site that sent them, and there are few limitations on what the site does with that knowledge, such as sell it to any or many other companies.
Companies like BlueCava have figured out a way to track online behavior without cookies. BlueCava’s device identification platform attempted to identify individual users based on which browser and device they were using, information that is sent along with every request to a web server. (BlueCava now describes the service it provides as “multi-screen identification capabilities,” presumably because it’s harder to decipher what that actually means.) This is one of the reasons privacy remedies focused on particular technical mechanisms, such as the (mostly ignored) “do not track” header or third-party cookies, cannot suffice on their own.
The situation is different with sites like Facebook and Twitter, which require users to sign up for an account that they are encouraged to remain logged in to. Unless you micromanage your web privacy settings and browser activity, these sites have the ability to track you across the web. Every time you go to a site that has a Facebook “like” button or a Twitter “tweet” button or a Google “+1” button, or a site that lets you comment with your Facebook or Twitter or Google account, those companies know that you’ve visited the site, whether or not you click on the button. And every time you click the “like” button or authorize an application, Facebook eagerly hands over your data to the online gaming company Zynga, and to newspapers and publishing companies. Sharing your information with a third-party application on Facebook is akin to poking a hole in a water balloon: only one prick is needed for everything to leak out.
The lie of the web is that each page is a discrete entity. This was true a generation ago, when pages were merely formatted text, but now that they host all sorts of code and cookies, it’s more accurate to think of web pages as collages of content, advertisements, federated services and tracking mechanisms that can talk to one another to a lesser or greater degree depending on your browser’s privacy settings. The web is becoming a tightly connected mass of trackers and bugs, a single beast with a million eyes pointing in every direction.
If you’re logged in to Facebook, Twitter, Google or Amazon, it’s safe to say these sites are tracking and retaining everything you’re doing on their sites and on any other sites that host their scripts and widgets. It’s how they make recommendations: for friends, products, events. Advertising targeters like Acxiom, Turn and BlueKai are tracking users in a different though equally invasive way. A newspaper’s website may know all the visitors to its site, but it knows nothing about their activities elsewhere. Its advertisers might, however, and Google Analytics certainly does: it offers a wide array of services to websites, tracking where users are coming from and what they search for before arriving at a page, all behind a slick interface. In exchange, Google gets to see the entire history of a site’s access logs. Google Analytics and similar services like Quantcast and comScore are so ubiquitous that most of your web browsing is likely captured by one or more of these companies. They don’t have your name, but they have your IP address, rough physical location and a good chunk of your activity online. This “raw data” is considered quite sensitive, since many companies have policies against retaining it. Google, for example, boasts of anonymizing IP addresses after nine months and cookies after eighteen. But many more sites lack any such policy, and few websites will promise that your visit has not been permanently archived by some unaffiliated company.