Can Languages Be Dated

European genetics seems at present compatible with both theories of Indo-European spread. A more decisive test would be to put a date on when proto-Indo-European was spoken, since the two theories imply very different times of expansion. The Kurgan warrior expansion started some 6,000 years ago, the spread of farming from the Near East some 9,500 years ago.

The dating of languages is not yet a settled science. One approach is to estimate the rate of historical change in a group of languages by analyzing similarities in vocabulary. Glottochronology, one version of this method, depends on estimating the percentage of cognates that two languages have in common. (Cognates are words derived from a common ancestor; apple is a cognate of German's Apfel but not of French pomme.)

The cognates that glottochronologists examine are not chosen randomly but belong to special vocabularies, drawn up by the method's inventor, Morris Swadesh, from items that are particularly resistant to linguistic change. These include words for numbers, pronouns and parts of the body. A Swadesh list of 100 words is the most commonly used. In comparing two languages, a linguist will decide how many Swadesh-list words in each are true cognates with each other. The fewer cognates, the longer ago the languages diverged, and there are various methods of translating the percentage of matching cognates into a date of language split. In Ehret's view, a 5% match indicates a language split of about 10,000 years ago, a 22% agreement means a divergence around 5,000 years ago, and two languages that parted ways only 500 years ago will retain 86% of their Swadesh-list vocabulary in common. Given the simplicity of the method, glottochronology can produce surprisingly plausible dates. But it has flaws. Linguists have put considerable effort into criticizing glottochronology, perhaps more than in trying to get it to work better. The result has been continuing disagreement among linguists as to whether it is a usable technique. At a conference held at Cambridge University in 1999, opinion ranged from one extreme to the other. Robert Blust, of the University of Hawaii, gave a paper explaining why the glottochronology kind of method "doesn't work" for Austronesian languages, and James Matisoff, of the University of California, Berkeley, talked about "the uselessness of glottochronology for the subgrouping of Tibeto-Burman." They were followed by Ehret, who explained how well glottochronology works for dating language splits in the Afroasiatic family.265 Historical linguists are much more enthusiastic about a quite different dating technique called linguistic paleontology. The idea is to reconstruct words for objects of material culture in a language family and date the language by noting the times at which such objects first appear in the archaeological record.

In many Indo-European languages, for example, there are words for wheel that are clear cognates of each other. Greek has kuklos (a word that is also the origin of circle), Sanskrit cacras, Tokharian kukal, and Old English hweowol (initial "k"s in proto-Indo-European turn to "h" sounds in the Germanic family branch). Since the daughter languages of proto-Indo-European have cognate words for wheel, they must be derived from a common source, and linguists assert that this was the proto-Indo-European word for wheel, which they reconstruct as *k wek wlos (the asterisk indicates a reconstructed word). Now, the earliest known wheels in the archaeological record date from 3400 BC (5,400 years ago). The proto-Indo-European language must have split into its daughter languages sometime after this date, the argument goes, since how else could the daughter languages, spoken over an enormous region, all have cognate words for wheel? Similar arguments can be made for words like yoke, axle, and wool. Work on this issue by linguists like Bill Darden of the University of Chicago has encouraged many linguists in their belief that Indo-European was a single language as recently as 5,500 years ago and that its daughter languages could not have come into existence until after this date.266 Linguistic paleontology is an ingenious exercise of the linguist's craft. But it has two conceptual weaknesses. One is that a splendid new invention like the wheel is likely to spread like wildfire from one culture to the next, carrying its own name with it. Linguistic paleontologists claim they can spot such borrowed words. It's true that "Coca-Cola" is easy enough to recognize as a foreign borrowing in many languages, but the more ancient the borrowing, the more a word may take on the coloration of its host language. One of the criticisms linguists level at glottochronology is that it is confounded by unrecognized borrowed words.

Another weakness in linguistic paleontology is the danger of constructing highly plausible words that didn't, in fact, exist. Related words for bishop exist in Greek (episkopos), Latin (episcopus), Old English (bisceop), Spanish (obispo) and French (eveque), from which the proto-Indo-European word *apispek for bishop could be reconstructed; but of course, in a language spoken at least 5,000 years ago, no such word existed. As for wheel, proto-Indo-European is thought to have had a word *k wel, meaning to turn or twist, of which *k wek wlos is assumed to be a duplication. But it could be that proto-Indo-European had no word for wheel, and what happened was that its daughter languages each independently used their inherited *k wel/turn words to form their own words for wheel. In which case proto-Indo-European could have been spoken thousands of years before the invention of the wheel.

A New Date for Proto-


A better, more systematic way of dating languages has long been needed, and biologists hope they may have provided it by adapting one of their own methods for drawing phylogenetic trees. The favored approach is called a maximum likelihood method because it asks what is the most probable shape of tree to account for the observed data. In the case of language families, the data are each language's list of Swadesh words, along with a designation of which are cognates and which are not. The idea of applying a maximum likelihood method to language history was laid out by Mark Pagel, an evolutionary biologist at the University of Reading in England. Pagel showed that with a list of just 18 words he could generate a maximum likelihood tree for 7 languages (Welsh, Romanian, Spanish, French, German, Dutch and English) that was the same as the tree constructed by linguists with purely linguistic techniques.267

The method has now been further developed by Russell D. Gray, an evolutionary biologist at the University of Auckland in New Zealand. Gray has carefully analyzed the problems of glottochronology and adapted the method so as to address them. One of the problems is unrecognized borrowing. Unrecognized loan words make languages appear younger than they are. But they also knit the side branches of a language together, making a netlike structure. Netlike structures can be tested for and the offending words eliminated.

Another problem that has vexed glottochronology is that languages may evolve at different rates. Both modern Icelandic and Norwegian are known to have evolved from Old Norse, which was spoken between AD 800 and 1050. Norwegian and Old Norse have 81% of their Swadesh list words as cognates, correctly implying a separation of 1,000 years ago. But modern Icelandic, which has been much more isolated, shares 99% of its words with Old Norse, wrongly implying the two languages

