Ever since we learned about PRISM, the NSA’s secret project to collect metadata on Americans by tapping into commercial online services, we’ve been confounded by a tangle of intangible clashing values. We are asked to balance “preventing terrorism” against “protecting privacy.” It is hard to demonstrate what terrorism would have occurred without preventive measures, and privacy is as much a feeling as a circumstance.
A hypothetical versus an emotion: the invisibles clash at the coliseum. There is a danger that this crucial controversy is being framed in so blurry a manner that it will blend into the wind and blow away. Maybe reflecting on the terms will bring the situation into focus.
Metadata systems are said to gather only tags and skeletal information, but not “content.” The distinction is instrumental rather than substantial. The line between the two can shift over time. When someone retweets to a group of people, that generates only metadata and no data. In that case, which is a common one, the distinction becomes meaningless.
Metadata is the aspect of data that programs can most reliably “understand.” It’s the topical stuff that is regimented into a standard structure, like the blanks filled in on a form. In order to treat real-world events as metadata, certain actions of people, rather than their expressions, are used to fill in those blanks. For instance, programs cannot understand the meaning of ordinary conversation, but a program can log when a call is made, and to whom.
What we mean these days when we talk about security is preventing terrorist attacks. I was close by on 9/11, so I understand, though I wonder if we’ve become too narrow in our sensibility. Nonetheless, keeping to our nation’s narrowed sensibility, what can metadata do to prevent attacks?
I have no direct knowledge of PRISM, so I can only assume that what has been leaked about it is accurate. If this is so, then PRISM is probably like the many other metadata systems I have known in other spheres. In commerce, where there is at least as much talent and money as in the intelligence game, metadata’s primary strength is not investigative.
Companies have to be a little like the NSA in miniature on occasion. Major spammers, phishers and hucksters have to be shut down, for instance, in order for the Internet to be of use. Criminals learned long ago to spew deceptive metadata, so it can’t be reliably used to identify the bad guys. Spammy comments to blogs are generated by a web of fake or commandeered accounts, for example. Metadata has been a useful adjunct to investigations, to be sure, but effective detective work still relies on lucky breaks or, even more often, on old-fashioned human gumption.
Once you’ve identified a bad guy, metadata might (if it hasn’t been faked) help you find accomplices or top up your evidence of misdeeds. It has been widely pointed out that metadata didn’t detect the Boston Marathon bombers in advance. Closer cooperation with Russian authorities might have, however. After the fact, metadata might just help identify their accomplices and peers. Let’s see if it does.
It’s natural for techies to think in terms of advancing technology. Certainly, when technology becomes advanced enough, we imagine, there will be a button we can press that simply catches the bad guys. We want it so much that we sometimes hallucinate that we already have it. But we don’t. So while a metadata system comes on like a predator, it turns out to be sessile. It’s a filter feeder: it can catch only what it’s programmed to look for, so long as enough raw material flows through it and it’s lucky.
So why is Silicon Valley obsessed with metadata? Because it turns out to be an astoundingly efficient tool for social engineering, which means it has unbounded commercial value.
Silicon Valley and Metadata
As it happens, I remember the moment when one of the big Silicon Valley metadata powers discovered metadata. Jeff Bezos would give tours of the first Amazon warehouse when the company was a novel, tiny startup. On the shelves were accumulating piles of books awaiting completion of the orders to be mailed out to customers. One of these piles, which seemed to be Bezos’ eureka pile, consisted of the titles How to Make Love to a Man, How to Make Love to a Woman and Hawaii: The Ultimate Guide.
This reading list alone told you something about the couple who had ordered the books. They were probably young, probably not together for very long, probably from somewhat (though not extremely) reserved or conservative backgrounds, probably heterosexual (duh!), and probably about to go to Hawaii with a yen to have a lot of sex. It would not be surprising if the trip was a honeymoon. They were not too rich or else they would have been to Hawaii already, but they could afford to go now—hence, lower middle class. The titles were very mainstream, with no twist, so maybe the people were like that, too. If we looked at the address, we could glean even more. Some of what occurs to me undoubtedly reflects my own prejudices and perhaps exposes flaws in my perception of the world. For example, these customers feel white to me, but is that fair?
A sensitive person can easily read all this, whether correctly or not, from a sideways glance at a pile of books—but can algorithms? The short answer is no, or at least not yet, though we pretend the answer is yes.
People and machines make different characteristic mistakes with metadata. The way people screw up is because of the crudeness of profiling. Maybe the customers were actually gay and black, spoiled trust fund babies at that, planning to stage an outrageous entertainment in a tiki bar in Pittsburgh.
The way algorithms screw up is bloodless and aimless. Maybe similar books were also ordered by someone who rents a few guesthouses in Hawaii, to restock a room from which similar titles had recently been pilfered. Our micro-hotelier might also have recently ordered a repair manual for an old Chevy truck. A metadata algorithm in the cloud might therefore send truck maintenance ads to the couple.
The blindness of metadata is supposedly addressed by big data. With enough data points, blind algorithms are supposed to approximate the intuition of people or better. The problem is that for any given person, there are rarely enough data for that to happen.
There was a time when online marketers thought they had a tool to find the perfect customers to suit a sale. They did not. If you suddenly need to unload your collection of rare disco sandals, it’s still up to you to list them on, say, eBay and for someone else to search for them. Despite all the sophistication and the vastness, metadata systems still can’t go out in a directed manner to find the perfect buyer and seller. (This is the commercial mirror of the failure of metadata systems to catch the Boston bombers in advance.)
Metadata has been a fad ever since computing got cheap enough that one could build giant remote computers to process huge amounts of it. Silicon Valley was fixated for a while, back in the 1990s, on the metaphor of the needle in the haystack. No more.
Social Engineering With Metadata
Metadata broke its original promise of finding a needle in a haystack, but instead offered a more lucrative gift than anyone knew to hope for. Moving haystacks turns out to be more valuable than finding needles. It turns out that metadata can very slowly, over time, get better-than-random results in generating a series of slight manipulations.
Metadata has proven to be a tool for certain kinds of behavioral change. Facebook can use metadata to find people who are more likely to agree to share information with each other, because they share history with each other anyway. This, in turn, increases the amount of metadata available to the algorithms. Once enough people are signed up, a new sphere of social mores is created and even more information is shared.
Metadata schemes always turn out to be more social engineering than artificial intelligence. In a few short years, because of Facebook, young people all over the world have started to think of privacy as something staid, from the fossilized world of their parents. What other technique could have effected this change?
The unreliability of metadata leads to a broad-brush form of targeting, meaning that all the competitors perform about as well when they use it. Not one of them has discovered a clever way to outperform the others as detectives or interpreters of human lives. Instead, they extract multitudes of tiny transactions, just by being there, putting links in front of people. Sessile.
Is PRISM Worth Worrying About?
The question that must be asked is how the peculiar, specific kind of power granted by having a big computer that processes metadata could be abused by a government. The answer to this question will tell us what we should fear about PRISM.
I find antiquarian metaphors useful as a means to counteract the vanities of present-day technophile triumphalism. So: a metadata system is more like a logbook than an automated Sherlock Holmes, though once in a while a real human Sherlock Holmes might scrutinize a logbook.
The abuse of a system like PRISM would probably not look like a high-tech Stasi. PRISM is unlikely to be optimal for suggesting who should be dragged off in the night by a police state. (Though maybe the senselessness and randomness of violence is part of what keeps violent regimes in power, but that is another question.) Instead, PRISM could be the preamble to a regime of slow, subtle conformity.
A government could engage in undetectable targeted repression using metadata. Something like PRISM might be hooked up to a secret civil defense mechanism in which certain people are subject to a couple of extra steps when they apply for a loan, for instance. Our enshrined secret FISA court would hear that those who might be linked to terrorists—albeit in only an exceedingly weak, insubstantial, statistical sense—must be investigated before buying up American real estate. (That could be part of a terrorist plot, after all.)
If you are a conservative, please read this paragraph: a misuse of something like PRISM might manifest itself as a slight increase in the difficulty with which conservatives can get credit. The difference would be just slight enough to be almost impossible to document definitively. And yet over enough time, the results would be substantial.
If you are a progressive, please read this paragraph: a misuse of something like PRISM might manifest itself as a slight increase in the difficulty with which progressives can get credit. The difference would be just slight enough to be almost impossible to document definitively. And yet over enough time, the results would be substantial.
Metadata systems can turn the kind of broad, almost subconscious mechanisms that have persistently held back African-Americans and Native Americans into a science. It’s been hard to say exactly why certain demographics and neighborhoods seem never to get ahead. Analysis can sometimes be harder than engineering.
Metadata as a tool of political power would paint with such a broad brush that you’d barely be able to notice it. If only people who oppose a war are targeted, for instance, a judicious number of hawks would also be drawn into the dragnet. The effect would be slight and statistical, but with the power of compound interest over time. Metadata could perfect plausibly deniable discrimination.
In a sense, it already has. Metadata is a slow, relentless concentrator of wealth and power for those who run the computers best able to calculate with it. The only form of targeting that is absolutely reliable is distinguishing those who run the biggest computers from everyone else. The former group can concentrate tremendous wealth, while the latter group languishes behind. The rise of big computers is a primary engine of the rise of the 1 percent. Therefore, metadata scheming could probably also be applied to subtly align a population with a government over some years. After a decade or two, the political opposition might be poor, divided, cranky and ineffective.
The proof is already with us. Young people, weaned on free Internet services that spy on them, seem to have accepted an America in which their financial prospects are reduced, and in which no one should expect “privacy.” The acquiescence of our young people is historically exceptional and bizarre. In the metadata age, privacy needs a new definition, and it might be “freedom from being profiled.” Or “equity with those who use the biggest computers.”
Metadata is a more natural tool for a plutocracy than for a junta. Metadata is not a tiger; it’s a barnacle. But don’t underestimate a barnacle. Tigers are endangered, while the ship of state is becoming more encrusted every day.
Also in this issue, Peter Maass writes that just as the Assange saga consumes too much of Alex Gibney’s film, so today’s Snowden obsession deflects attention away from our sprawling surveillance state.