Does the weather effect the languages we speak?
This week, Caleb Everett, Damian Blasi and I have a paper out in PNAS (also available here) on the effects of humidity on the production and perception of lexical tone, and the subsequent predictions about the distribution of tone across the world.
The basic principle behind studies of cultural evolution is that a selective pressure on communication can transform the structures of a language over time. What we explore is whether speaking in dry environments exerts a pressure to avoid using sounds that are more difficult to produce or comprehend, leading to those sounds being selected against.
Edit: See also this FAQ page
This isn't exactly a new idea. There have been previous studies linking the sonority of a language's phoneme inventory to climate variables, based on similar principles (here and here), and more recently, Caleb Everett looked at ejectives and altitude (which we've previously reviewed). However, our study uses a much larger sample than previous studies, and has stricter controls for genealogical and areal relationships.
The first part of our paper reviews a basic finding from various experiments: inhaling dry air dries out the vocal folds, leading to adverse effects in articulation, specifically aspects of phonation that are important for careful control of pitch. That is, in dry environments, distinctions in pitch come under a selective pressure.
To review tone: all languages use differences in pitch to communicate. In English, you can raise the pitch at the end of a sentence to turn a statement into a question or to highlight particular words in a sentence. However, it's easy to communicate in English without using pitch - you could still understand someone if they didn't vary the 'notes' in their sentence and spoke in a monotone like a robot. However, pitch is much more important in some languages, like Mandarin, where changing the pitch of a word can completely change its meaning. For example, 'ma' spoken with a level pitch means 'mother', while 'ma' spoken with a falling then rising pitch would mean 'horse'. Languages with contrasts between two types of tone are described as 'simple' tone languages, and those with more distinctions are described as 'complex' (the largest number of distinctions coded in our database is 12).
The main result in our paper is that there is a difference in the distribution of tonal and non-tonal languages according to climatic factors like average humidity: complex tonal languages are rare in dry places. The graph below shows the empirical distribution of different types of language (in the paper we show the graph from the WALS data, but the graph below is from ANU data, which we use in the monte-carlo analysis):
Each curve starts at the bottom left and increases each time we observe a language at the given humidity, until we've observed all languages at the top right. We can see that we're less likely to observe languages with many contrastive tones in regions with lower humidity (the shaded section shows the lower quartile of humidity).
However, demonstrating this statistically wasn't straightforward. Note that the prediction is that there's a selective pressure against tone in dry environments - it doesn't make any prediction about the distributions in humid climates. That is, it's a uni-directional implication (discussed in a previous post). In this case, a simple regression approach is not suitable. Furthermore, the languages in our data are related historically, potentially inflating the correlation. We've written extensively on spurious correlations, but we'd like to point out that, in this case, we have a plausible mechanism which makes a concrete prediction.
Damián's ingenious solution was a Monte Carlo test, as illustrated below. First, we take a random sample of languages with complex tone - one from each language family, so we know they are independent. We look at the distribution of humidity for this sample, and work out the 15th percentile - the point below which 15% of the languages are fall in the distribution. We then do the same for a sample of languages without complex tone. We can compare the difference in 15th percentile values for the two distributions of humidity - let's call this d. The prediction is that the complex-tone languages will have a higher 15th percentile than the non-complex-tone languages. So we run this procedure many times, and come up with a distribution of d values. We predict that the majority of values of d will be positive (distribution of complex tone languages is shifted to the right).
And indeed, that's what we found: there were differences in the climatic distributions of tonal and non-tonal languages in dry regions, but less so for more humid regions.
We also ran the same tests, but selecting languages by geographic region instead of language family, in order to demonstrate the results are not due to contact or borrowing (regions were defined according to Autotyp region, designed to reflect areas of linguistic contact). Again, we find that there are differences in drier regions, but not more humid regions (for humidity, proportion of samples where d is positive: 15th quantile: 96%; 25th quantile: 99%; 75th quantile: 68%).
I also ran a serendipity check with the WALS data (although we have a prior motivating hypothesis, so we don't report this in the paper). I ran a chi squared test comparing the distribution of tone types against whether the language is in the driest 25th percentile of MAT (chi squred = 46.7, df = 2, p < 0.0001):
|WALS tone||Humidity 1st quartile (driest)||Humidity 2-4th quartile|
Then I did the same for every variable in WALS. Humidity had a stronger relationship with tone than with 96% of variables (with many of the top variables also being consistent with the overall theory). Of course, this doesn't control for language family, but does indicate that the correlation is not spurious.
The paper and supporting information has a more detailed discussion of individual cases, and some more statistical checks.
So, it seems that there is reasonable support for the climate providing a selective pressure on the form of languages. Crucially, this selective pressure is present because of communicative needs - ease of production and comprehension. That is, this pressure can apply because the 'ecology' of language is use in interaction.
We could extend the idea of cultural selection further to suggest that one aspect of a language (tonal or non-tonal) can itself provide a knock-on selective pressure for other aspects of language. Recently, I did some work with Francisco Torreira and Harald Hammarström showing just this: linguists have argued that lexical tone and phrase-level intonation compete for the same linguistic resource (pitch), and we showed that languages with lexical tone are less likely to use intonation to distinguish questions versus statements.
Many studies of cultural evolution focus on cognitive selective pressures (e.g. processing, memory, frequency etc.), which are usually assumed to apply universally. The current study suggests that some pressures may not be universal, but only apply in particular situations. This adds to the literature on niche-specific cultural evolution, such as the effect of population size on morphological complexity, or demography on phoneme inventory.
The link between climate and language is still tentative, and we're planning a series of follow-up studies. Still, it opens up the possibility for a field of 'Geophonetics'.
Everett, Blasi, Roberts (2015) Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots. PNAS, pdf