On the basis of Sean’s comment, about using a regression to look at how phoneme inventory size improved as geographic spread was incorporated along with population size, I decided to look at the stats a bit more closely (original post is here). It’s fairly easy to perform multiple regression in R, which, in the case of my data, resulted in highly significant results (p<0.001) for the intercept, area and population (residual standard error = 9.633 on 393 degrees of freedom; adjusted R-Squared = 0.1084). I then plotted all the combinations as scatterplots for each pair of variables. As you can see below, this is fairly useful as a quick summary but it is also messy and confusing. Another problem is that the pairs plot is on the original data and not the linear model.
It’s long since been established that demography drives evolutionary processes (see Hawks, 2008 for a good overview). Similar attempts are also being made to describe cultural (Shennan, 2000; Henrich, 2004; Richerson & Boyd, 2009) and linguistic (Nettle, 1999a; Wichmann & Homan, 2009; Vogt, 2009) processes by considering the effects of population size and other demographic variables. Even though these ideas are hardly new, until recently, there was a ceiling as to the amount of resources one person could draw upon. In linguistics, this paucity of data is being remedied through the implementation of large-scale projects, such as WALS, Ethnologue and UPSID, that bring together a vast body of linguistic fieldwork from around the world. Providing a solid direction for how this might be utilised is a recent study by Lupyan & Dale (2010). Here, the authors compare the structural properties of more than 2000 languages with three demographic variables: a language’s speaker population, its geographic spread and the number of linguistic neighbours. The salient point being that certain differences in structural features correspond to the underlying demographic conditions.
With that said, a few months ago I found myself wondering about a particular feature, the phoneme inventory size, and its potential relationship to underlying demographic conditions of a speech community. What piqued my interest was that two languages I retain a passing interest in, Kayardild and Pirahã, both contain small phonological inventories and have small speaker communities. The question being: is their a correlation between the population size of a language and its number of phonemes? Despite work suggesting at such a relationship (e.g. Trudgill, 2004), there is little in the way of empirical evidence to support such claims. Hay & Bauer (2007) perhaps represent the most comprehensive attempt at an investigation: reporting a statistical correlation between the number of speakers of a language and its phoneme inventory size.
In it, the authors provide some evidence for the claim that the more speakers a language has, the larger its phoneme inventory. Without going into the sub-divisions of vowels (e.g. separating monophthongs, extra monophtongs and diphthongs) and consonants (e.g. obstruents), as it would extend the post by about 1000 words, the vowel inventory and consonant inventory are both correlated with population size (also ruling out that language families are driving the results). As they note:
That vowel inventory and consonant inventory are both correlated with population size is quite remarkable. This is especially so because consonant inventory and vowel inventory do not correlate with one another at all in this data-set (rho=.01, p=.86). Maddieson (2005) also reports that there is no correlation between vowel and consonant inventory size in his sample of 559 languages. Despite the fact that there is no link between vowel inventory and consonant inventory size, both are significantly correlated with the size of the population of speakers.
Using their paper as a springboard, I decided to look at how other demographic factors might influence the size of the phoneme inventory, namely: population density and the degree of social interconnectedness.