This year's EvoLang is busy - around 100 talks in 4 parallel sessions and 40 posters. Replicated Typo is hosting a series of EvoLang previews to help people decide on what to go and see. If you'd like to post a preview of your own presentation, please get in touch with firstname.lastname@example.org.
Roberts, Dediu & Levinson. Detecting differences between the languages of Neanderthals and modern humans. Thursday, 17:45, session A.
Recently, Dediu & Levinson (2013) argued that, given recent genetic and archaeological evidence, the default assumption should be that Neandertals spoke modern languages (not protolanguages). Dediu will be giving a talk on this work in the same session. My talk will discuss whether there are methods that can test these ideas. Is there any way to estimate what Neandertal languages were like? It's a controversial topic, but could have big implications for the field.
The idea is the following: When two groups that speak different languages come into contact, bits and pieces of language are often borrowed. Linguists have used the similarities and differences between languages to reconstruct historical events such as contact between populations. Much of this work has been done on a case-by-case basis on existing or written languages, but there are also quantitative large-scale work that reaches further back. For example, Russell Gray and colleagues have demonstrated that differences in basic vocabulary can predict ancient migrations in polynesia.
We now know that there was contact between humans and Neandertals, and if Neandertals had modern language then we may have borrowed linguistic features from them. This could mean that there are bits and pieces in languages spoken today that were influenced by contact with Neandertals. We might expect these bits and pieces to be different from human languages, either because they were culturally adapted to Neandertal cognitive or physical biases (e.g. Neandertals had bigger mouths), or just because Neandertal languages explored different areas of the possible space of languages than human ones.
The task, then, is to find aspects of language that differ between populations that were in contact with Neandertals (Eurasia) and those that were not (most of Africa). However, this is harder than traditional historical linguistic analysis because (a) there is no record of Neandertal languages for comparison (b) the events we are reconstructing happened much further back than most analyses and (c) in order for the Neandertal bits to 'survive', the rate of cultural change would have to be very slow.
Our first attempt at this analysis used the World Atlas of Language Structures. We used k-means clustering to split the world's languages into two groups so that languages within a group are more similar to each other than to languages in the other group. We then use permutation tests to determine whether the categories that come out of this procedure align better with the African/Non-African distinction than would be expected by chance. If so, this suggests that there is a statistical signal that we can utilise.
We also used what we know about how languages are related historically to estimate the features of the founding language of a language family (e.g. Indo-European or Atlantic-Congo). This removes some of the noise from the present-day data and also gives us an estimate of how biased each language family is towards using a particular linguistic feature. We can then look for linguistic features that change slowly and where African and Eurasian language families have opposing cultural evolutionary biases. These are candidates for Neadndertal language features. We don't expect that there is a single feature that stands out as being different between African and non-African languages, but rather that there might be a collection of features that are different. I'll demonstrate a method that uses Support Vector Machines to extract a list of candidate features, then test how many variables are needed to make
Since the evidence is very indirect, the key to these claims is robustness. We run each analysis with 3 different phylogenetic trees, 3 types of branch length assumptions and 3 types of ancestral reconstruction models. This probably represents one of the widest-scale historical linguistic reconstructions yet attempted.
We also estimated the number of founding populations for the amount of linguistic diversity in Africa and Eurasia. This was done using the STRUCTURE program, which is used to estimate founding populations for genetic diversity. For languages, the best-fitting model is one with two founding populations. This is consistent with contact with Neandertals, but also a lot of other effects.
In the end, it looks like we don't currently have enough data of a high enough quality to answer these questions. However, I think it's interesting that the methods exist and that the questions raised by Dediu & Levinson are actually empirically testable. Furthermore, the data we need is not necessarily data on ancient languages, but current ones that are undocumented. That is, we can actually get the data we need. There are already several large-scale databases of languages that are being put together. It's an exciting time for linguistics.