The EvoLang Causal Graph Challenge

This year at EvoLang, I’m releasing CHIELD: The Causal Hypotheses in Evolutionary Linguistics Database.  It’s a collection of theories about the evolution of language, expressed as causal graphs.  The aim of CHIELD is to build a comprehensive overview of evolutionary approaches to language.  Hopefully it’ll help us find competing and supporting evidence, link hypotheses together into bigger theories and generally help make our ideas more transparent. You can access CHIELD right now, but hang around for details of the challenges.

The first thing that CHIELD can help express is the (sometimes unexpected) causal complexity of theories.  For example, Dunbar (2004) suggests that gossip replaced physical grooming in humans to support increasingly complicated social interactions in larger groups.  However, the whole theory is actually composed of 29 links, involving predation risk, endorphins and resource density:

The graph above might seem very complicated, but it was actually constructed just by going through the text of Dunbar (2004) and recording each claim about variables that were causally linked.  By dividing the theory into individual links it becomes easier to think about each part.

Second, CHIELD also helps find other theories that intersect with this one through variables like theory of mind, population size or the problem of freeriders, so you can also use CHIELD to explore multiple documents at once.  For example, here are all the connections that link population size and morphological complexity (9 papers so far in the database):

The first thing to notice is that there are multiple hypotheses about how population size and morphological complexity are linked.  We can also see at a glance that there are different types of evidence for each link.  Some are supported from multiple studies and methods, while others are currently just hypotheses without direct evidence.

However, CHIELD won’t work without your help!  CHIELD has built-in tools for you – yes YOU – to contribute.  You can edit data, discuss problems and add your own hypotheses.  It’s far from perfect and of course there will be disagreements.  But hopefully it will lead to productive discussions and a more cohesive field.

Which brings us to the challenges …

The EvoLang Causal Graph challenge: Contribute your own hypotheses

You can add data to CHIELD using the web interface.  The challenge is to draw your EvoLang paper as a causal graph.  It’s fun!  The first two papers to be contributed will become part of my poster at EvoLang.

Here are some tips:

  • Break down your hypothesis into individual causal links.
  • Try to use existing variable names, so that your hypothesis connects to other work.  You can find a list of variables here, or the web interface will suggest some.  But don’t be afraid to add new variables.
  • Try to add direct quotes from the paper to the “Notes” field to support the link.
  • If your paper is already included, do you agree about the interpretation? If not, you can raise an issue or edit the data yourself.

More help is available here.  Click here to add data now!  Your data will become available on CHIELD, and your name will be added to the list of contributors.

Bonus Challenge: Contribute 5 papers, become a co-author!

I’ll be writing an article about the database and some initial findings for the Journal of Language Evolution.  If you contribute 5 papers or more, then you’ll be added as a co-author.  As an incentive to contribute further, co-authors will be ordered by the number of papers they contribute.  This offer is open to anyone studying evolutionary linguistics, not just people presenting at EvoLang.  You should check first whether the paper you want to add has already been included.

Bonus Challenge: Contribute some code, become a co-author!

CHIELD is open source.  The GitHub repository for CHIELD has some outstanding issues. If you contribute some programming to address them, you’ll become a co-author on the journal article.

Robust, Causal, and Incremental Approaches to Investigating Linguistic Adaptation

We live in an age where we have more data on more languages than ever before, and more data to link it with from other domains. This should make it easier to test hypotheses involving adaptation, and also to spot new patterns that might be explained by adaptation.  For example, the proposed link between climate and tone languages could never have been investigated without massive global databases.  However, there is not much discussion of the overall approach to research in this area.

This week I published a paper in a special issue on the Adaptive Value of Langauges, outlining the maximum robustness approach to these problems.  I then try to apply this approach to the debate about the link between tones and climate.

In a nutshell, I suggest that research should be:


Instead of aiming for the most valid test for a hypothesis, we should consider as many sources of data and as many processes as possible.  Agreement between them supports a theory, but differences can also highlight which parts of a theory are weak.


Researchers should be more explicit about the causal effects in their hypotheses.  Formal tools from causal graph theory can help formulate tests, recognise weaknesses and avoid talking past each other.


Realistically, a single paper can’t be the final word on a topic, and shouldn’t aim to.  Statistical studies of large-scale, cross-cultural data are very complicated, and we should expect small steps to establishing causality.

I applying these ideas to the debate about tone and climate.  Caleb Everett also published a paper in this issue showing that speakers in drier regions use vowels less frequently in their basic vocabulary. I test whether the original link with tone and the new link with vowels holds up when using different data sources and different statistical frameworks.  The correlation with tone is not robust, while the correlation with vowels seems more promising.

I then suggest some ideas for alternative methodological approaches to this theory that could be tested.  For example:

  • An iterated artificial learning experiment
  • A phonetic study of vowel systems
  • A historical case-study of 5 Bantu languages
  • A corpus study of tone use in Cantonese and conversational repair in Mandarin
  • A corpus study of Larry King’s speech


Resister: A sci-fi sequel about cultural evolution and academic funding

In 2016, Casey Hattrey combined literary genres that had long been kept far apart from each other: science fiction, academic funding applications and cultural evolution theory. Space Funding Crisis I: Persister was a story that tried to “put the fun in academic funding application and the itch in hyper-niche”. It was criticised as “unrealistic and too centered on academics to be believable” and “not a very good book”. Dan Dediu’s advice was “better not even start reading it,” and Fiona Jordan’s review was literally a four-letter word. Still, that hasn’t stopped Hattrey from writing the sequel that the title of the first book tried to warn us about.

The badly conceived artwork for Resister

Space Funding Crisis II: Resister continues to follow the career of space linguist Karen Arianne. Just when she thought she’d gotten out of academia, the shadowy Central Academic Funding Council Administration pulls her back in for one more job. Or at least a part-time post-doc. Her mission: solve the mystery of the great convergence. Over thousands of years of space-faring, human linguistic diversity has exploded, but suddenly people have started speaking the same language. What could have caused this sinister twist? Who are the Panini Press? And what exactly is research insurance? Arianne’s latest adventure sees her struggle against ‘splainer bots, the conference mafia and her own inability to think about the future.

To say that this was the “difficult second book” would give too much credit to the first.  Hattrey seems to have learned nothing about writing or science since the last time they ventured into the weird world of self-published online novels. The characters have no distinct voice, the plot doesn’t make much sense and there are eye-watering levels of exposition.  In the appendix there’s even an R script which supports some of the book’s predictions, and even that is badly composed.  Even some of the apparently over-the-top futuristic ideas like insurance for research hypotheses are a bit behind existing ideas like using prediction markets for assessing replicability.

If there is a theme between the poorly formatted pages, then it’s emergence: complex patterns arising from simple rules. Arianne has a kind of spiritual belief in just reacting, Breitenberg-like, to the here-and-now rather than planning ahead. Apparently Hattrey intends this to translate into a criticism of the pressures of early-career academic life.  But this never really materialises out of the bland dialogue and insistence on putting lasers everywhere.

Still, where else are you going to find a book that makes fun of the slow science movement, generative linguistics and theories linking the emergence of tone systems to the climate?

Resister is available for free, including in various formats, including for kindle, iPad and nook. The prequel, Persister is also available (epub, kindle, iPad, nook).

Persister: Space Funding Crisis I  Resister: Space Funding Crisis II

CfP: Measuring Language Complexity at EvoLang

This is a guest post from Aleksandrs Berdicevskis about the workshop Measuring Language Complexity.

A lot of evolutionary talks and papers nowadays touch upon language complexity (at least nine papers did this at the Evolang 2016). One of the reasons is probably that complexity is a very convenient testbed for testing hypotheses that establish causal links between linguistic structure and extra-linguistic factors. Do factors such as population size, or social network structure, or proportion of non-native speakers shape language change, making certain structures (for instance, those that are morphologically simpler) more evolutionary advantageous and thus more likely? Or don’t they? If they do, how exactly?

Recently, quite a lot has been published on that topic, including attempts to do rigorous quantitative tests of the existing hypotheses. One problem that all such attempts face is that complexity can be understood in many different ways, and operationalized in yet many more. And unsurprisingly, the outcome of a quantitative study depends on what you choose as your measure! Unfortunately, there currently is little consensus about how measures themselves can be evaluated and compared.

To overcome this, we organize a shared task “Measuring Language Complexity”, a satellite event of Evolang 2018, to take place in Torun on April 15. Shared tasks are widely used in computational linguistics, and we strongly believe they can prove useful in evolutionary linguistics, too. The task is to measure the linguistic complexity of a predefined set of 37 language varieties belonging to 7 families (and then discuss the results, as well as their mutual agreement/disagreement at the workshop). See the detailed CfP and other details here.

So far, the interest from the evolutionary community has been rather weak. But there is still time! We extended the deadline until February 28 and are looking forward to receiving your submissions!

CfP: Applications in Cultural Evolution, June 6-8, Tartu

Guest post by Peeter Tinits and Oleg Sobchuk
As mentioned in this blog before, evolutionary thinking can help the study of various cultural practices, not just language. The perspective of cultural evolution is currently seeing an interesting case of global growth and coordination – the widely featured founding of the Cultural Evolution Society (also on replicatedtypo), the recent inaugural conference and follow-ups are bringing a diverse set of researchers around the same table. If this has gone past you unnoticed – there’s nice resourcesgathered on the society website.
Evolutionary thinking seems useful for various purposes. However does it work the same everywhere, and can research progress in one domain be easily carried over to another?
To make better sense of it, we’re organizing a small conference to discuss the ways that evolutionary thinking can be best applied in different domains. The event “Applications in Cultural Evolution: Arts, Languages, Technologies” is to take place in June 6-8 in Tartu, Estonia. Pleanary speakers include:
We  invite contributions from cultural evolution researchers of various persuasions and interests to talk about their work and how the evolutionary models help with that. Deadline for abstracts on Feb 14.
Discussion of individual contributions will hopefully lead to a better understanding of commonalities and differences in how cultural evolution is applied in different areas, and help build an understanding of how to most productively use evolutionary thinking – what are the prospects and limitations. We aim to allow for building a common ground through plenty of space and opportunities for formal and informal discussion on site.
Both case studies and general perspectives welcome. In addition to original research we encourage participants to think of the following questions:
– What do you get out of cultural evolution research?
– How should we best apply evolutionary thinking to culture?
– What matters when we apply this to different domains or timescales?
Deadline for abstracts: February 14, 2018
Event dates: June 6-8
Location: Tartu University, Estonia
Full call for papers and information on the website. Also available as PDF.

Deadline extended for Triggers of Change in the Language Sciences

The deadline for the 2nd XLanS conference on Triggers of Change in the Language Sciences has extended its submission deadline to June 14th.

This year’s topic is ‘triggers of change’:  What causes a sound system or lexicon or grammatical system to change?  How can we explain rapid changes followed by periods of stability?  Can we predict the direction and rate of change according to external influences?

We have also added two new researchers to our keynote speaker list, which now stands as:


Wh-words sound similar to aid rapid turn taking

A new paper by Anita Slonimska and myself attempts to link global tendencies in the lexicon to constraints from turn taking in conversation.

Question words in English sound similar (who, why, where, what …), so much so that this class of words are often referred to as wh-words. This regularity exists in many languages, though the phonetic similarity differs, for example:

English Latvian Yaqui Telugu
haw ka: jachinia elaa
haw mɛni tsik jaikim enni
haw mətʃ tsik jaiki enta
wət kas jita eem;eemi[Ti]
wɛn kad jakko eppuDu
wɛr kuɾ jaksa eTa; eedi; ekkaDa
wɪtʃ kuɾʃ jita eevi
hu kas jabesa ewaru
waj ˈkaːpeːts jaisakai en[du]ceeta; enduku

In her Master’s thesis, Anita suggested that these similarities help conversation flow smoothly.  Turn taking in conversation is surprisingly swift, with the usual gap between turns being only 200ms.  This is even more surprising when one considers that the amount of time it takes to retrieve, plan and begin pronouncing one word is 600ms.  Therefore, speakers must begin planning what they will say before current speaker has finished speaking (as demonstrated by many recent studies, e.g. Barthel et al., 2017). Starting your turn late can be interpreted as uncooperative, or lead to missing out on a chance to speak.

Perhaps the harshest environment for turn-taking is answering a content question.  Responders must understand the question, retrieve the answer, plan their utterance and begin speaking.  It makes sense to expect that cues would evolve to help responders recognise that a question is coming.  Indeed there are many paralinguistic cues, such as rising intonation (even at the beginning of sentences) and eye gaze.  Another obvious cue is question words, especially when they appear at the beginning of question sentences. Slonimska hypothesised that wh-words sound similar in order to provide an extra cue that a question is about to be asked, so that the speaker can begin preparing their turn early.

We tried to test this hypothesis, firstly by simply asking whether wh-words really do have a tendency to sound similar within languages.  We combined several lexical databases to produce a word list for 1000 concepts in 226 languages, including question words.  We found that question words are:

  • More similar within languages than between languages
  • More similar than other sets of words (e.g. pronouns)
  • Often composed of salient phonemes

Of course, there are several possible confounds, such as languages being historically related, and many wh-words being derived from other wh-words within a language. We attempted to control for this using stratified permutation, excluding analysable forms, and comparing wh words to many other sets of words such as pronouns which are subject to the same processes.  Not all languages have similar-sounding wh-words, but across the whole database the tendancy was robust.

Another prediction is that the wh-word cues should be more useful if they appear at the beginning of question sentences.  We tested this using typological data on whether wh-words appear in initial position.  While the trend was in the right direction, the result was not significant when controlling for historical and areal relationships.

Despite this, we hope that our study shows that it is possible to connect constraints from turn taking to macro-level patterns across languages, and then test the link using large corpora and custom methods.

Anita will be presenting an experimental approach to this question at this year’s CogSci conference.  We show that /w,h/ is a good predictor of questions in real English conversations, and that people actually use /w,h/ to help predict that a question is coming up.

Slonimska, A., & Roberts, S. G. (2017). A case for systematic sound symbolism in pragmatics: Universals in wh-words. Journal of Pragmatics, 116, 1-20. ArticlePDF.

All data and scripts are available in this github repository.

Iconicity evolves by random mutation and biased selection

A new paper by Monica Tamariz, myself, Isidro Martínez and Julio Santiago uses an iterated learning paradigm to investigate the emergence of iconicity in the lexicon.  The languages were mappings between written forms and a set of shapes that varied in colour, outline and, importantly, how spiky or round they were.

We found that languages which begin with no iconic mapping develop a bouba-kiki relationship when the languages are used for communication between two participants, but not when they are just learned and reproduced.  The measure of the iconicity of the words came from naive raters.

Here’s one of the languages at the end of a communication chain, and you can see that the labels for spiky shapes ‘sound’ more spiky:

An example language from the final generation of our experiment: meanings, labels and spikiness ratings.

These experiments were actually run way back in 2013, but as is often the case, the project lost momentum.  Monica and I met last year to look at it again, and we did some new analyses.  We worked out whether each new innovation that participants created increased or decreased iconicity.  We found that new innovations are equally likely to result in higher or lower iconicity: mutation is random.  However, in the communication condition, participants re-used more iconic forms: selection is biased.  That fits with a number of other studies on iconicity, including Verhoef et al., 2015 (CogSci proceedings) and Blasi et al. (2017).

Matthew Jones, Gabriella Vigliocco and colleagues have been working on similar experiments, though their results are slightly different.  Jones presented this work at the recent symposium on iconicity in language and literature (you can read the abstract here), and will also present at this year’s CogSci conference, which I’m looking forward to reading:

Jones, M., Vinson, D., Clostre, N., Zhu, A. L., Santiago, J., Vigliocco, G. (forthcoming). The bouba effect: sound-shape iconicity in iterated and implicit learning. Proceedings of the 36th Annual Meeting of the Cognitive Science Society.

Our paper is quite short, so I won’t spend any more time on it here, apart from one other cool thing:  For the final set of labels in each generation we measured iconicity using scores from nieve raters, but for the analysis of innovations we had hundreds of extra forms.  We used a random forest to predict iconicity ratings for the extra labels from unigrams and bigrams of the rated labels.  It accounted for 89% of the variance in participant ratings on unseen data.  This is a good improvement over some old techniques such as using the average iconicity of the individual letters in the label, since random forests allows the weighting of particular letters to be estimated from the data, and also allows for non-linear effects when two letters are combined.

However, it turns out that most of the prediction is done by this simple decision tree with just 3 unigram variables. Shapes were rated as more spiky if they contained a ‘k’, ‘j’ and ‘z’ (our experiment was run in Spanish):

So the method was a bit overkill in this case, but might be useful for future studies.

All data and code for doing the analyses and random forest prediction is available in the supporting information of the paper, or in this github repository.

Tamariz, M., Roberts, S. G., Martínez, J. I. and Santiago, J. (2017), The Interactive Origin of Iconicity. Cogn Sci. doi:10.1111/cogs.12497[pdf from MPI]

Biggest linguistics experiment ever links perception with linguistic history

Back in March 2014, Hedvig Skirgård and I wrote a post about the Great Language Game.  Today we’ve published those results in PLOS ONE, together with the Game’s creator Lars Yencken.

One of the fundamental principles of linguistics is that speakers that are separated in time or space will start sound different, while speakers who interact with each other will start to sound similar.  Historical linguists have traced the diversification of languages using objective linguistic measurements, but so far there has never been a widespread test of whether languages further away on a family tree or more physically distant from each other actually sound different to human listeners.

An opportunity arose to test this in the form of The Great Language Game: a web-based game where players listen to a clip of someone talking and have to guess which language is being spoken.  It was played by nearly one million people from 80 countries, and so is, as far as we know, the biggest linguistic experiment ever.  Actually, this is probably my favourite table I’ve ever published (note the last row):

Continent of IP-address Number of guesses
Europe 7,963,630
North America 5,980,767
Asia 841,609
Oceania 364,390
South America 356,390
Africa 74,032
Antarctica 11

We calculated the probability of confusing any of the 78 languages in the Great Language Game for any of the others (excluding guesses about a language if it was an official language of the country the player was in).  Players were good at this game – on average getting 70% of guesses correct.

Using partial Mantel tests, we found that languages are more likely to be confused if they are:

  • Geographically close to each other;
  • Similar in their phoneme inventories
  • Similar in their lexicon
  • Closely related historically (but this effect disappears when controlling for geographic proximity)

We also used Random Forests analyses to show that a language is more likely to be guessed correctly if it is often mentioned in literature, is the main language of an economically powerful country, is spoken by many people or is spoken in many countries.

We visualised the perceptual similarity of languages by using the inverse probability of confusion to create a neighbour net:

This diagram shows a kind of subway map for the way languages sound. The shortest route between two languages indicates how often they are confused for one another – so Swedish and Norwegian sound similar, but Italian and Japanese sound very different. The further you have to travel, the more different two languages sound.  So French and German are far away from many languages, since these were the best-guessed in the corpus.

The labels we’ve given to some of the clusters are descriptive, rather than being official terms that linguists use.  The first striking pattern is that some languages are more closely connected than others, for example the Slavic languages are all grouped together, indicating that people have a hard time distinguishing between them. Some of the other groups are more based on geographic area, such as the ‘Dravidian’ or ‘African’ cluster. The ‘North Sea’ cluster is interesting: it includes Welsh, Scottish Gaelic, Dutch, Danish, Swedish, Norwegian and Icelandic.  These diverged from each other a long time ago in the Indo-European family tree, but have had more recent contact due to trade and invasion across the North Sea.

The whole graph splits between ‘Western’ and ‘Eastern’ languages (we refer to the political/cultural divide rather than any linguistic classification). This probably reflects the fact that most players were Western, or at least could probably read the English website.  That would certainly explain the linguistically confused “East Asian” cluster.  There are also a lot of interconnected lines, which indicates that some languages are confused for multiple groups, for example Turkish is placed halfway between “West” and “East” languages.

It was also possible to create neighbour nets for responses from specific parts of the world. While the general pattern is similar, there are also some interesting differences.  For example, respondents from North America were quite likely to confused Yiddish and Hebrew.  They come from different language families, but are spoken by a mainly Jewish population and this may form part of players’ cultural knowledge of these languages.

In contrast, players from African placed Hebrew with the other Afro-Asiatic languages.

Results like this suggest that perception may be shaped by our linguistic history and cultural knowledge.

We also did some preliminary analyses on the phoneme inventories of languages, using a binary decision tree to explore which sounds made a language distinctive.  Binary decision trees identified some rare and salient features as critical cues to distinctiveness.

The future

The analyses were complicated because we knew little about the individuals playing beyond the country of their IP address.  However, Hedvig and I, together with a team from the Language in Interaction consortium (Mark Dingemanse, Pashiera Barkhuysen and Peter Withers) create a version of the game called LingQuest that does collect people’s linguistic background.  It also asks participants to compare sound files directly, rather than use written labels.

You can download LingQuest as an apple App, or play it online here.




Conference: Triggers of change in the language sciences

The University of Lyon 2 is proud to announce ‘Triggers of Language Change in the Language Sciences’.

October 11th-14th 2017, University of Lyon, France.

See the website for our call for papers and further details.

The conference is part of the “X in the Language Sciences” (XLanS) series which aims to bring a wide range of researchers together to focus on a particular topic in language that interests them.  The goal is to identify the crucial issues and connect them with cutting-edge techniques in order to develop better explanations of linguistic phenomena (see details of the first conference “Causality in the language sciences” here).

This year’s topic is ‘triggers of change’:  What causes a sound system or lexicon or grammatical system to change?  How can we explain rapid changes followed by periods of stability?  Can we predict the direction and rate of change according to external influences?

Our keynote speakers include:
Michael C. Gavin (Colorado State University)
Monica Tamariz (Heriot Watt University)
Sarah Thomason (University of Michigan)
Brigitte Pakendorf (University of Lyon)
Alan Yu (University of Chicago)
We are pleased to be able to offer scholarships to cover travel for students from the developing world and reduced rates for lower-income attendees.  See the Registration Details page for details.

The XLanS committee,

Christophe Coupé, Damián Blasi, Dan Dediu, Hedvig Skirgård, Julia Uddén, Seán Roberts