This weekend I appeared in an NPR article about Ian Maddieson and Christophe Coupe's work on the effects of climate and ecology on the sound systems of languages. I haven't read the study itself, but I did get to see the slides that Maddieson and Coupe presented this week at the Acoustical Society of America. Essentially, they find that speech sounds have high efficacy - adaptation to being transmitted and received in the local ecology. Specifically, languages tend to be more sonorous (less 'consonant heavy') in warmer places with more tree cover. This makes sense, since these kind of sounds are better at cutting through these obstacles.
I was interviewed for about an hour, but the quotes from me in the piece are actually a bit out of context. For instance, I claim that this is the first study of its kind, when there have been several studies which looks at climate and language (including one I co-authored with Caleb Everett and Damian Blasi). But, to be fair, Maddieson and Coupe's study is probably the one with the greatest range of ecological variables and most sophisticated linguistic measure (although I'm not sure yet how they control for historical relatedness).
You can read about the study in the article above, but I wanted to address another thing I'm quoted as saying. The interviewer asked if it was possible to see these effects in a single language or a single speaker, and I said that it was very unlikely, but that I'd tried to do this with transcripts of Larry King. I went on to say that I absolutely wouldn't publish this because it was a crazy idea that nobody would believe.
But now the cat's out of the bag, so here's what I found.
Does Larry King use more vowels when it's warmer?
If we see language change as natural selection operating on individual utterances in conversation (a la Croft and others), then we should be able to see selective pressures at work in the utterances of an individual speaker. This should also apply to the influence of climate. Given enough data, you should be able to see an individual adapting over the seasons. In light of Maddieson and Coupe's hypothesis, speakers should use proportionately more vowels compared to consonants when it's warmer.
CNN provides transcripts for every show broadcast between 2000 and 2012. Larry King has done an interview practically every day on the show 'Larry King Live' (which has been used before in linguistic studies), so I extracted Larry King's speech transcripts (excluding the guest's speech, mentions of the location of the studio and the guest's names). For each transcript I counted the frequency of each letter, then calculated the ratio of vowels (aeiouy) to consonants. Then, for each air date, I found the actual temperature and humidity data for that date and the location of the show (CNN studios in LA). The show is occasionally recorded in Washington DC or New York. I tried to detect these automatically and matched them with the climate measures for the CNN studios in those cities.
There are about 3,500 transcripts over 11 years, about 90% of which were aired on consecutive days. (I know nothing about this show, and am a bit surprised by its frequency! I'll have to check whether the transcripts include repeats).
Here is a depiction of the results for temperature and humidity:
The Black lines show King's vowel ratio (higher = more vowels) and bars are 95% confidence intervals around the mean for each week of the year. The maximum and minimum temperature are shown in red and blue. Below is a similar graph for specific humidity.
Surprisingly, there is some variation in proportion of vowels, and it looks like there's a trend in the right direction.
To analyse the data, I used a linear mixed effects model, predicting vowel ratio by (log) text length and maximum temperature, with random effects for year and each week (580 separate weeks, to try to control for topical issues).
Maximum temperature significantly improves the fit of the model over a null model with text length (Chi Squre = 23.7, df = 1, p < 0.00001).
Model estimates: Estimate Std. Error t value (Intercept) 4.100e-01 8.114e-04 505.3 maxTempC.loc 9.628e-05 1.973e-05 4.9 text.total.log.center 5.340e-03 8.400e-04 6.4
King uses proportionately more vowels when it's warmer. The effect is very small: On average, there is a difference of about 15 vowels used in an hour of conversation between summer and winter. A model with location-specific maximum temperature improves the model fit over one with just LA-specific maximum temperature (Chi Sqaured = 11.59, df=1 , p=0.0007).
Of course, the temperature is not independent from day to day, so I also tried a lagged regression, predicting vowel ratio by text length (total) and maximum temperature of the recording location (maxTempC.loc). Lagging back in time by days.
Coefficients: Estimate Std. Error t value Pr(>
|t|) (Intercept) 4.076e-01 7.342e-04 555.073 < 2e-16 *** d[lag.0, ]$textLength 3.772e-07 6.715e-08 5.618 2.08e-08 *** d[lag.0, ]$maxTempC.loc 1.008e-04 3.300e-05 3.055 0.00227 ** d[lag.1, ]$maxTempC.loc 5.009e-05 3.812e-05 1.314 0.18893 d[lag.2, ]$maxTempC.loc 1.549e-07 3.840e-05 0.004 0.99678 d[lag.3, ]$maxTempC.loc -7.405e-05 3.809e-05 -1.944 0.05195 . d[lag.4, ]$maxTempC.loc -2.418e-06 3.300e-05 -0.073 0.94159
The temperature of actual day is still significant, taking into account previous days. Note that the coefficient is negative after 3 lagged days. (please forgive the rough analysis- it's all I have left after my computer broke recently).
I'm not really sure what to make of this. Given the data above, there is an argument that King is adapting the way he speaks to the climate. However, a LOT more needs to be done in order to show this. There are several confounding factors, such as the show being recorded in an air conditioned studio, the topics or guests might be different, there might be seasonal topics or key-words which affect the results (though one might argue that the lexicon for things related to cold climates has adapted). The graphs show big jumps around week 32, which I can't explain yet. Then there is the question of the mechanism - how exactly is King adapting? By choice of alternative words, or constructions? And, of course, the transcripts are orthorgraphic. And, of course, the idea is crazy.
Still, I think it's amazing that we're now in a position where we can even start asking these kind of questions with data.