Last week in an EU:Sci podcast, Christos Christodoulopoulos challenged me to find a correlation between the basic word order of the language people use and the number of children they have. This was off the back of a number of spurious correlations with which readers of Replicated Typo will be familiar. Here are the results!
First, I do a straightforward test of whether word order is correlated with the number of children you have. This comes out as significant! I wonder if having more children hanging around affects the adaptive pressures on langauge? However, I then show that this result is undermined by discovering that there are other linguistic variables that are even better predictors.
I used the World Values Survey: a large database of survey results from thousands of people around the world, including what language they speak and how many children they have. I then linked this up with linguistic typology data from the World Atlas of Language Structures. This includes information on the basic word order of each language.
The hypothesis was that people who used particular basic word orders would have more children. Testing this hypothesis directly, basic word order is a significant predictor of the number of children a person has (linear regression, controlling for age, sex, if the person was married, if they were employed their level of education and religion, t-value for basic word order = -18.179, p < 0.00001, model predicts 36% of the variance). It turns out that speakers of SOV langauges have more children than speakers of SVO languages, while speakers with no dominant order have the fewest children on average (there wasn't enough data for other word order types). Indeed, in the strict interpretation of Evolution, SOV order is the fittest variant, since it is linked with having more offspring:
An explanation might be found in information theory: When hearing a sentence, you want the most unpredictable information at the start. If you only have one child, then if you know the sentence is about them (the Subject) what you want to know next is what they've done (the Verb). For instance, "Harry smashed the window". Hence, SVO order. However, if you've got more than one child, then what you want to know after the perpetrator is who they've done it to. For instance "Harry Hannah Hit". Hence, SOV order.
Actually, a more interesting hypothesis is that having more children around influences the learning pressures on language. If children find SOV easier to process or learn, there is a pressure on langauge to change to fit their cognitive niche. Therefore, we should expect societies with more children to exhibit this word order. Indeed, some previous studies suggest that SOV order is the most basic or ancestral form historically (Luke Maurits, in press, RT coverage, paper, video lecture, Gell-Mann & Ruhlen, 2011 - see post). However, Maurits' work suggests that SVO is actually more efficient from an information-theoretic perspective. Do children have different cognitive biases? Alternatively, if children can negotiate their communication system with their parents (as Suzanne Quay argues), then they might push for SOV order more often. I'm not sure there's a lot of support for this, though.
The results above show the absolute strength of the relationship between word order and number of children. However, given that there are many links between social and linguistic variables, and indeed, we should expect cultural variables to be correlated at a rate greater than chance due to geographic diffusion. A much more robust approach is to hypothesise that basic word order will predict the number of children a speaker has better than any other linguistic variable. Therefore, I tested every linguistic variable in the WALS.
I ran a number of linear regressions with the number of children as the dependent variable and the linguistic variable as the independent variable, controlling for age, sex, if the person was married, if they were employed, their level of education and their religion. We can then look at the top predictors by the magnitude of the F-score of the coefficient for the linguistic variable. Here are the top 11 predictors out of the 142 linguistic variables:
|11||-18.178||Order of Subject, Object and Verb||4|
|9||-18.392||Nominal and Locational Predication||2|
|8||18.7233||Position of Case Affixes||4|
|7||-18.921||Number of Cases||6|
|5||-20.585||Voicing and Gaps in Plosive Systems||3|
|4||21.0051||Presence of Uncommon Consonants||4|
|3||21.5613||Order of Adjective and Noun||3|
|2||21.6676||Distance Contrasts in Demonstratives||4|
|1||-23.106||Front Rounded Vowels||3|
The order of Subject, Object and Verb is the 11th best linguistic predictor of the number of children a person has. Christos was on to something.
Order of Subject, Object and Verb are in the top 15% of linguistic variables.
A stepwise regression including the control variables and the top 15 linguistic variables resulted in the following model which accounted for 37% of the variance (Linguistic variables sorted by significance at the top). The order of Subject, Object and Verb does make it in, but it is the weakest linguistic predictor. I've included the results for religions. Note that this data encodes religions beliefs but also geographic region.
|Variable||Estimate||Std. Error||t value||p|
|Distance Contrasts in Demonstratives||-1.638726||0.258902||-6.33||2.52E-10||***|
|Nominal and Locational Predication||-2.10949||0.387485||-5.444||5.27E-08||***|
|Presence of Uncommon Consonants||0.283943||0.052343||5.425||5.88E-08||***|
|Order of Adjective and Noun||-1.079676||0.230618||-4.682||2.87E-06||***|
|Position of Case Affixes||0.072319||0.022343||3.237||1.21E-03||**|
|Front Rounded Vowels||0.464718||0.14582||3.187||0.00144||**|
|Voicing and Gaps in Plosive Systems||0.363138||0.126894||2.862||4.22E-03||**|
|Order of Subject, Object and Verb||-0.048055||0.024726||-1.943||5.20E-02||.|
|Religion: Cao dai||0.003941||0.642107||0.006||0.995103|
|Religion: Church of Christ||0.877693||1.015201||0.865||0.387297|
|Religion: Don´t know||0.287651||0.236245||1.218||0.223395|
|Religion: Free church/Non denominational church||0.477016||0.59528||0.801||0.422951|
|Religion: Hoa hao||0.244492||0.260974||0.937||0.348852|
|Religion: Independent African Church (e.g. ZCC, Shembe, etc.)||-0.693823||1.015311||-0.683||0.494388|
|Religion: Israelita Nuevo Pacto Universal (FREPAP)||1.852352||1.431956||1.294||0.195826|
|Religion: Jehovah witnesses||0.15356||0.195869||0.784||0.433053|
|Religion: No answer||0.421118||0.160595||2.622||0.008743||**|
|Religion: Not applicable||0.030884||0.100801||0.306||0.759314|
|Religion: Other: Christian com||0.036961||0.592217||0.062||0.950236|
|Religion: Roman Catholic||0.193663||0.102697||1.886||0.059342||.|
|Religion: Salvation Army||0.635279||0.831224||0.764||0.444717|
|Religion: Seven Day Adventist||0.214763||0.321136||0.669||0.503657|
|Religion: The Church of Sweden||0.216487||0.490911||0.441||0.659225|
- Estimate = the size and direction of the relationship (in number of children). Positive values means a positive relationship. E.g. you'll have on average 0.15 children less as your level of education increases.
- Std. Error = how well the variable fits the data
- t value = the strength of the correlation. Large positive values means large
- p = the probability that this correlation occurred by chance
Some other linguistic factors
Distance contrasts refer to deictic expressions such as 'this' (near speaker) and 'that' (further away from speaker) in English. People with more children tend to have more specific contrasts:
This makes sense, since if there are more children running around, you'll need more specific demonstratives to refer to them.
Another interaction was with ways of marking nominal (Bobby is a child) and locational (Bobby is in the garden) meanings. English has only one form for these two meanings (is), but other languages have separate forms. People with more children tend to speak languages with separate ways of marking the nominal and locational:
Some other interesting patterns emerged: People who say "Two children" (Numeral before noun) have fewer children than people who say "children two" (Noun before numeral). Below is the graph for the gender distinctions in pronouns:
But my favourite outcome is for distributive numerals. A sentence like "John and Mary have two children" has two possible interpretations: Either John and Mary have a total of 2 children between them, or John has 2 children and Mary has 2 other children. In English, we would distinguish these meanings lexically by putting 'each' or 'between them' at the end of the sentence. Some languages indicate this difference with morphology. According to the data, having no strategy for indicating this difference is linked to having more children:
Here's my explanation: If a person who has no way of distinguishing the two meanings of the sentence above, when they hear "John and Mary have two children", they might think "Damn, John and Mary have 4 children! I'd better get some more of my own ... ". This leads to a runaway affect of people having more children in order to keep up with what their neighbours.
Weak Explanatory Power
Of course, these theories are crazy. However, their plausibility does not derive from the correlation - I could come up with even wackier stories about why these variables were connected and they would be equally supported by the correlations. As James Winters and I argue (Roberts & Winters, 2012, see here), these kinds of statistical tests are good for generating hypotheses, but have very weak explanatory power. They need to work together with idiographic, experimental and modelling approaches in order to support the mechanisms they suggest.
Sean Roberts, & James Winters (2012). Constructing Knowledge: Nomothetic approaches to language evolution Five Approaches to Language Evolution: Proceedings of the Workshops of the 9th International Conference on the Evolution of Language
Gell-Mann, M., & Ruhlen, M. (2011). The origin and evolution of word order Proceedings of the National Academy of Sciences, 108 (42), 17290-17295 DOI: 10.1073/pnas.1113716108
Maurits et al. (in press). Why are some word orders more common than others? A uniform information density account Advances in Neural Information Processing Systems, 23,