Peter Mark Roget, the nineteenth-century physician and polymath who invented the thesaurus, was the grandson of a Geneva clockmaker. His father, a Protestant minister, had immigrated to London in 1775. It seems appropriate that the scion of Calvinists and technicians would be the man to organize the English language into a thousand concepts, divided into six classes further divided into divisions and sections. Were you to need a synonym for "love," for instance, you would have found it lodged between concept No. 896, "Congratulation," and No. 898, "Hate," themselves lodged in the sixth class, "Affections"–not, say, "Matter," where the division "Sensation" resides, though Byron might have put it there; nor, say, "Intellect," where Dante might have put it.
The first reverse-engineered dictionary was published in 1852 in London, when Roget was 73, after a lifetime of compulsive list-making. According to his most recent biographer, Joshua Kendall–The Man Who Made Lists: Love, Death, Madness, and the Creation of Roget’s Thesaurus (Berkley; paper, $16)–his first list seems to have been "Dates of Deaths," which began with his father’s death date when Peter was 4. But Roget’s Thesaurus was also the fruit of an age whose mania was classification, and the class/division/section of the book was the direct descendent of the phylum/class/order system first put in place in 1735 by Linnaeus to organize the plant and animal kingdoms. In so many ways, English was a forest full of flora and fauna; Roget was out to mold it into a botanical garden and zoo.
We still live in that world, with technology-driven semantic fields birthing whole species of new vocabulary annually. In lieu of a definitive answer to the question "How many words are there in English?" the Oxford English Dictionary Online has a chart listing tranches of vocabulary from most to least common, what percentage of the corpus each tranche represents and example words. The hundred most common words (from, because, go, me, our…) account for 50 percent of the corpus; the thousand most common words (girl, win, decide…) account for 75 percent of the corpus; but at 99 percent of the corpus we may have a vocabulary of 1 million words, which include the likes of endobenthic and pomological. That 1 percent of extremely rare, specialized words is what takes us from the average American high-schooler’s vocabulary of 60,000 words to the endlessly receding horizon of ever more exotic, but extremely precise, terminology. English is a monster with a very long tail, and that’s why attempts to tame it–from Roget to Strunk and White–are vulnerable to poetic backlash. Even in the popular imagination our paragon remains cornucopian Shakespeare, who wrote before standardized spelling.
About 400 million people speak this monster English as a first language. Few people shed tears over dying languages displaced by English and other national languages. According to the Hans Rausing Endangered Languages Project, there are 6,500 languages in the world today, and half of them are bound for extinction within fifty to 100 years. Without the disarming smile of the African slender-snouted crocodile, the pathos of the Galapagos penguin or the splendor of the golden langur, it’s difficult to mount a telegenic campaign to preserve and promulgate them.