The Meta Question
Ever since we learned about PRISM, the NSA’s secret project to collect metadata on Americans by tapping into commercial online services, we’ve been confounded by a tangle of intangible clashing values. We are asked to balance “preventing terrorism” against “protecting privacy.” It is hard to demonstrate what terrorism would have occurred without preventive measures, and privacy is as much a feeling as a circumstance.
A hypothetical versus an emotion: the invisibles clash at the coliseum. There is a danger that this crucial controversy is being framed in so blurry a manner that it will blend into the wind and blow away. Maybe reflecting on the terms will bring the situation into focus.
Metadata systems are said to gather only tags and skeletal information, but not “content.” The distinction is instrumental rather than substantial. The line between the two can shift over time. When someone retweets to a group of people, that generates only metadata and no data. In that case, which is a common one, the distinction becomes meaningless.
Metadata is the aspect of data that programs can most reliably “understand.” It’s the topical stuff that is regimented into a standard structure, like the blanks filled in on a form. In order to treat real-world events as metadata, certain actions of people, rather than their expressions, are used to fill in those blanks. For instance, programs cannot understand the meaning of ordinary conversation, but a program can log when a call is made, and to whom.
What we mean these days when we talk about security is preventing terrorist attacks. I was close by on 9/11, so I understand, though I wonder if we’ve become too narrow in our sensibility. Nonetheless, keeping to our nation’s narrowed sensibility, what can metadata do to prevent attacks?
I have no direct knowledge of PRISM, so I can only assume that what has been leaked about it is accurate. If this is so, then PRISM is probably like the many other metadata systems I have known in other spheres. In commerce, where there is at least as much talent and money as in the intelligence game, metadata’s primary strength is not investigative.
Companies have to be a little like the NSA in miniature on occasion. Major spammers, phishers and hucksters have to be shut down, for instance, in order for the Internet to be of use. Criminals learned long ago to spew deceptive metadata, so it can’t be reliably used to identify the bad guys. Spammy comments to blogs are generated by a web of fake or commandeered accounts, for example. Metadata has been a useful adjunct to investigations, to be sure, but effective detective work still relies on lucky breaks or, even more often, on old-fashioned human gumption.
Once you’ve identified a bad guy, metadata might (if it hasn’t been faked) help you find accomplices or top up your evidence of misdeeds. It has been widely pointed out that metadata didn’t detect the Boston Marathon bombers in advance. Closer cooperation with Russian authorities might have, however. After the fact, metadata might just help identify their accomplices and peers. Let’s see if it does.
It’s natural for techies to think in terms of advancing technology. Certainly, when technology becomes advanced enough, we imagine, there will be a button we can press that simply catches the bad guys. We want it so much that we sometimes hallucinate that we already have it. But we don’t. So while a metadata system comes on like a predator, it turns out to be sessile. It’s a filter feeder: it can catch only what it’s programmed to look for, so long as enough raw material flows through it and it’s lucky.
So why is Silicon Valley obsessed with metadata? Because it turns out to be an astoundingly efficient tool for social engineering, which means it has unbounded commercial value.
Silicon Valley and Metadata
As it happens, I remember the moment when one of the big Silicon Valley metadata powers discovered metadata. Jeff Bezos would give tours of the first Amazon warehouse when the company was a novel, tiny startup. On the shelves were accumulating piles of books awaiting completion of the orders to be mailed out to customers. One of these piles, which seemed to be Bezos’ eureka pile, consisted of the titles How to Make Love to a Man, How to Make Love to a Woman and Hawaii: The Ultimate Guide.
This reading list alone told you something about the couple who had ordered the books. They were probably young, probably not together for very long, probably from somewhat (though not extremely) reserved or conservative backgrounds, probably heterosexual (duh!), and probably about to go to Hawaii with a yen to have a lot of sex. It would not be surprising if the trip was a honeymoon. They were not too rich or else they would have been to Hawaii already, but they could afford to go now—hence, lower middle class. The titles were very mainstream, with no twist, so maybe the people were like that, too. If we looked at the address, we could glean even more. Some of what occurs to me undoubtedly reflects my own prejudices and perhaps exposes flaws in my perception of the world. For example, these customers feel white to me, but is that fair?
A sensitive person can easily read all this, whether correctly or not, from a sideways glance at a pile of books—but can algorithms? The short answer is no, or at least not yet, though we pretend the answer is yes.
People and machines make different characteristic mistakes with metadata. The way people screw up is because of the crudeness of profiling. Maybe the customers were actually gay and black, spoiled trust fund babies at that, planning to stage an outrageous entertainment in a tiki bar in Pittsburgh.
The way algorithms screw up is bloodless and aimless. Maybe similar books were also ordered by someone who rents a few guesthouses in Hawaii, to restock a room from which similar titles had recently been pilfered. Our micro-hotelier might also have recently ordered a repair manual for an old Chevy truck. A metadata algorithm in the cloud might therefore send truck maintenance ads to the couple.
The blindness of metadata is supposedly addressed by big data. With enough data points, blind algorithms are supposed to approximate the intuition of people or better. The problem is that for any given person, there are rarely enough data for that to happen.
There was a time when online marketers thought they had a tool to find the perfect customers to suit a sale. They did not. If you suddenly need to unload your collection of rare disco sandals, it’s still up to you to list them on, say, eBay and for someone else to search for them. Despite all the sophistication and the vastness, metadata systems still can’t go out in a directed manner to find the perfect buyer and seller. (This is the commercial mirror of the failure of metadata systems to catch the Boston bombers in advance.)
Metadata has been a fad ever since computing got cheap enough that one could build giant remote computers to process huge amounts of it. Silicon Valley was fixated for a while, back in the 1990s, on the metaphor of the needle in the haystack. No more.