Having more children affects your basic word order

May 11, 2012 in Uncategorized

 

Last week in an EU:Sci podcast, Christos Christodoulopoulos challenged me to find a correlation between the basic word order of the language people use and the number of children they have.  This was off the back of a number of spurious correlations with which readers of Replicated Typo will be familiar.  Here are the results!

First, I do a straightforward test of whether word order is correlated with the number of children you have.  This comes out as significant!  I wonder if  having more children hanging around affects the adaptive pressures on langauge?  However, I then show that this result is undermined by discovering that there are other linguistic variables that are even better predictors.

I used the World Values Survey:  a large database of survey results from thousands of people around the world, including what language they speak and how many children they have.  I then linked this up with linguistic typology data from the World Atlas of Language Structures.  This includes information on the basic word order of each language.

The hypothesis was that people who used particular basic word orders would have more children.  Testing this hypothesis directly, basic word order is a significant predictor of the number of children a person has (linear regression, controlling for age, sex, if the person was married, if they were employed their level of education and religion, t-value for basic word order = -18.179, p < 0.00001, model predicts 36% of the variance).  It turns out that speakers of SOV langauges have more children than speakers of SVO languages, while speakers with no dominant order have the fewest children on average (there wasn’t enough data for other word order types).  Indeed, in the strict interpretation of Evolution, SOV order is the fittest variant, since it is linked with having more offspring:

An explanation might be found in information theory:  When hearing a sentence, you want the most unpredictable information at the start.  If you only have one child, then if you know the sentence is about them (the Subject) what you want to know next is what they’ve done (the Verb).  For instance, “Harry smashed the window”.  Hence, SVO order.  However, if you’ve got more than one child, then what you want to know after the perpetrator is who they’ve done it to.  For instance “Harry Hannah Hit”.  Hence, SOV order.

Actually, a more interesting hypothesis is that having more children around influences the learning pressures on language.  If children find SOV easier to process or learn, there is a pressure on langauge to change to fit their cognitive niche.  Therefore, we should expect societies with more children to exhibit this word order.  Indeed, some previous studies suggest that SOV order is the most basic or ancestral form historically (Luke Maurits, in press, RT coverage, paper, video lecture,  Gell-Mann & Ruhlen, 2011 – see post).  However, Maurits’ work suggests that SVO is actually more efficient from an information-theoretic perspective.  Do children have different cognitive biases?  Alternatively, if children can negotiate their communication system with their parents (as Suzanne Quay argues), then they might push for SOV order more often.  I’m not sure there’s a lot of support for this, though.

Relative strength

The results above show the absolute strength of the relationship between word order and number of children.  However, given that there are many links between social and linguistic variables, and indeed, we should expect cultural variables to be correlated at a rate greater than chance due to geographic diffusion.  A much more robust approach is to hypothesise that basic word order will predict the number of children a speaker has better than any other linguistic variable.  Therefore, I tested every linguistic variable in the WALS.

I ran a number of linear regressions with the number of children as the dependent variable and the linguistic variable as the independent variable, controlling for age, sex, if the person was married, if they were employed, their level of education and their religion. We can then look at the top predictors by the magnitude of the F-score of the coefficient for the linguistic variable.  Here are the top 11 predictors out of the 142 linguistic variables:

Rank F Linguistic Variable Levels
11 -18.178 Order of Subject, Object and Verb 4
10 -18.259 Distributive Numerals 5
9 -18.392 Nominal and Locational Predication 2
8 18.7233 Position of Case Affixes 4
7 -18.921 Number of Cases 6
6 19.6516 The Prohibitive 4
5 -20.585 Voicing and Gaps in Plosive Systems 3
4 21.0051 Presence of Uncommon Consonants 4
3 21.5613 Order of Adjective and Noun 3
2 21.6676 Distance Contrasts in Demonstratives 4
1 -23.106 Front Rounded Vowels 3

The order of Subject, Object and Verb is the 11th best linguistic predictor of the number of children a person has. Christos was on to something.

Order of Subject, Object and Verb are in the top 15% of linguistic variables.

A stepwise regression including the control variables and the top 15 linguistic variables resulted in the following model which accounted for 37% of the variance (Linguistic variables sorted by significance at the top).  The order of Subject, Object and Verb does make it in, but it is the weakest linguistic predictor.  I’ve included the results for religions.  Note that this data encodes religions beliefs but also geographic region.

Variable Estimate Std. Error t value p
(Intercept) 5.885786 1.136007 5.181 2.23E-07 ***
Distance Contrasts in Demonstratives -1.638726 0.258902 -6.33 2.52E-10 ***
The Prohibitive 0.545461 0.087404 6.241 4.45E-10 ***
Distributive Numerals 0.312082 0.05664 5.51 3.64E-08 ***
Nominal and Locational Predication -2.10949 0.387485 -5.444 5.27E-08 ***
Presence of Uncommon Consonants 0.283943 0.052343 5.425 5.88E-08 ***
Order of Adjective and Noun -1.079676 0.230618 -4.682 2.87E-06 ***
Position of Case Affixes 0.072319 0.022343 3.237 1.21E-03 **
Front Rounded Vowels 0.464718 0.14582 3.187 0.00144 **
Voicing and Gaps in Plosive Systems 0.363138 0.126894 2.862 4.22E-03 **
Order of Subject, Object and Verb -0.048055 0.024726 -1.943 5.20E-02 .
Age2 1.058244 0.027837 38.016 lt2e-16 ***
Age3 1.631778 0.031804 51.307 lt2e-16 ***
Age4 2.034569 0.03893 52.262 lt2e-16 ***
Age5 2.007591 0.092081 21.802 lt2e-16 ***
Sex: Male 0.222451 0.71478 0.311 0.75564
Sex: Female 0.447403 0.714745 0.626 0.531348
Married 0.787531 0.02325 33.872 lt2e-16 ***
Education -0.157613 0.005094 -30.94 lt2e-16 ***
Religion: Bahai -0.80284 1.015179 -0.791 0.42905
Religion: Buddhist 0.408742 0.113193 3.611 0.000306 ***
Religion: Cao dai 0.003941 0.642107 0.006 0.995103
Religion: Christian 0.386166 0.330088 1.17 2.42E-01
Religion: Church of Christ 0.877693 1.015201 0.865 0.387297
Religion: Don´t know 0.287651 0.236245 1.218 0.223395
Religion: Evangelical 0.544132 0.116618 4.666 3.09E-06 ***
Religion: Free church/Non denominational church 0.477016 0.59528 0.801 0.422951
Religion: Gregorian 0.45786 1.017574 0.45 0.65275
Religion: Hindu 0.055063 0.130321 0.423 0.672652
Religion: Hoa hao 0.244492 0.260974 0.937 0.348852
Religion: Independent African Church (e.g. ZCC, Shembe, etc.) -0.693823 1.015311 -0.683 0.494388
Religion: Israelita Nuevo Pacto Universal (FREPAP) 1.852352 1.431956 1.294 0.195826
Religion: Jain -0.001387 0.84262 -0.002 0.998687
Religion: Jehovah witnesses 0.15356 0.195869 0.784 0.433053
Religion: Jew -0.267181 0.24443 -1.093 0.274374
Religion: Mormon 1.273458 0.646912 1.969 0.049024 *
Religion: Muslim 0.312295 0.127812 2.443 0.014559 *
Religion: Native 0.084522 0.445898 0.19 0.849661
Religion: No answer 0.421118 0.160595 2.622 0.008743 **
Religion: Not applicable 0.030884 0.100801 0.306 0.759314
Religion: Orthodox 0.097043 0.133664 0.726 0.467835
Religion: Other 0.145137 0.124887 1.162 0.245192
Religion: Other: Christian com 0.036961 0.592217 0.062 0.950236
Religion: Pentecostal 0.138404 0.254035 0.545 0.585882
Religion: Protestant 0.275903 0.107648 2.563 0.010385 *
Religion: Roman Catholic 0.193663 0.102697 1.886 0.059342 .
Religion: Salvation Army 0.635279 0.831224 0.764 0.444717
Religion: Seven Day Adventist 0.214763 0.321136 0.669 0.503657
Religion: Sikh -1.052661 1.438875 -0.732 0.464431
Religion: Spiritualists 0.596523 1.015377 0.587 0.556883
Religion: Taoist -1.926019 1.432068 -1.345 0.178667
Religion: The Church of Sweden 0.216487 0.490911 0.441 0.659225

The jargon:

  • Estimate = the size and direction of the relationship (in number of children). Positive values means a positive relationship. E.g. you’ll have on average 0.15 children less as your level of education increases.
  • Std. Error = how well the variable fits the data
  • t value = the strength of the correlation. Large positive values means large
  • p = the probability that this correlation occurred by chance

Some other linguistic factors

Distance contrasts refer to deictic expressions such as ‘this’ (near speaker) and ‘that’ (further away from speaker) in English.  People with more children tend to have more specific contrasts:

This makes sense, since if there are more children running around, you’ll need more specific demonstratives to refer to them.

Another interaction was with ways of marking nominal (Bobby is a child) and locational (Bobby is in the garden) meanings.  English has only one form for these two meanings (is), but  other languages have separate forms.  People with more children tend to speak languages with separate ways of  marking the nominal and locational:

Some other interesting patterns emerged:  People who say “Two children” (Numeral before noun) have fewer children than people who say “children two” (Noun before numeral).  Below is the graph for the gender distinctions in pronouns:

 

But my favourite outcome is for distributive numerals. A sentence like “John and Mary have two children” has two possible interpretations:  Either John and Mary have a total of 2 children between them, or John has 2 children and Mary has 2 other children.  In English, we would distinguish these meanings lexically by putting ‘each’ or ‘between them’ at the end of the sentence.  Some languages indicate this difference with morphology.  According to the data, having no strategy for indicating this difference is linked to having more children:

Here’s my explanation:  If a person who has no way of distinguishing the two meanings of the sentence above, when they hear “John and Mary have two children”, they might think “Damn, John and Mary have 4 children!  I’d better get some more of my own … “.  This leads to a runaway affect of people having more children in order to keep up with what their neighbours.

Weak Explanatory Power

Of course, these theories are crazy.  However, their plausibility does not derive from the correlation – I could come up with even wackier stories about why these variables were connected and they would be equally supported by the correlations.  As James Winters and I argue (Roberts & Winters, 2012, see here), these kinds of statistical tests are good for generating hypotheses, but have very weak explanatory power.  They need to work together with idiographic, experimental and modelling approaches in order to support the mechanisms they suggest.

 

Sean Roberts, & James Winters (2012). Constructing Knowledge: Nomothetic approaches to language evolution Five Approaches to Language Evolution: Proceedings of the Workshops of the 9th International Conference on the Evolution of Language

Gell-Mann, M., & Ruhlen, M. (2011). The origin and evolution of word order Proceedings of the National Academy of Sciences, 108 (42), 17290-17295 DOI: 10.1073/pnas.1113716108

Maurits et al. (in press). Why are some word orders more common than others? A uniform information density account Advances in Neural Information Processing Systems, 23,