James and I have a new paper out in PLOS ONE where we demonstrate a whole host of unexpected correlations between cultural features. These include acacia trees and linguistic tone, morphology and siestas, and traffic accidents and linguistic diversity.
We hope it will be a touchstone for discussing the problems with analysing cross-cultural statistics, and a warning not to take all correlations at face value. It's becoming increasingly important to understand these issues, both for researchers as more data becomes available, and for the general public as they read more about these kinds of study in the media (e.g. recent coverage in National Geographic, the BBC and TED). But why are the public fascinated with these findings? Here's my guess:
People are always intrigued by stories of scientific discovery. From Mary Anning's discovery of a fossilised ichthyosaur when she was just 12 years old, to Fleming's accidental production of penicilin to Newton's apple, it's tempting to think that anyone could trip over a major breakthrough that is out there just waiting to be found. This is perhaps why there has been so much media interest recently in studies which show surprising statistical links between cultural features such as chocolate consumption and Nobel laureates, future tense and economic decisions, linguistic gender and power or geography and phoneme inventory.
Everett recalled being shocked by his discovery. "I remember stepping out from my desk and saying, 'Okay, this is kind of crazy,'" he said. "My first question was, How had we not noticed this?"
That is, we live in an age when there is more data available than ever before, it's more widely available and there are better tools to do analyses. Anyone with an ordinary laptop and access to the internet could make these discoveries. Indeed, we've uncovered many unexpected correlations at Replicated Typo. However, just as Anning's discoveries were made as the theory of biological evolution was still developing, the ability to detect correlations in cultural features is outstripping the understanding of how to assess these findings. Early reconstructions of fossils included a lot of errors, some of which have been difficult to redress in the public's mind. Without a good understanding of cultural evolution, similar mistakes might be made during the current race to find statistical links in our field.
Everyone knows that correlation does not imply causation, but there are other problems inherent in studies of cultural features. One problem that is often discounted in these kinds of study is the historical relationship between cultures. Cultural features tend to diffuse in bundles, inflating the apparent links between causally unrelated features. This means that it's not a good idea to count cultures or languages as independent from each other. Here's an example: Suppose we look at a group of highschool students and wonder whether the colour of their t-shirts correlates with the kind of food they bring for lunch. We survey 10 children, and see that 5 wear red t-shirts and eat peanut-butter sandwiches. This appears to be strong evidence for a link, but then we see that these 5 pupils come from the same family. There's now a better explanation for the trend - the children from the same family tend to have the same choice of clothes and are given the same lunch by their parents. The same problem exists for languages. Languages in the same historical families, like English and German, tend to have inherited the same bundles of linguistic features. For this reason, it can be quite complicated to work out whether there really are causal links between cultural properties.
Our paper tries to demonstrate the importance of controlling for this problem by pointing out a chain of statistically significant links, some of which are unlikely to be causal. The diagram below shows the links, those marked with 'Results' are links that we've discovered and demonstrate in the paper.
For instance, linguistic diversity is correlated with the number of traffic accidents in a country, even controlling for population size, population density, GDP and latitude. While there may be hidden causes, such as state cohesion, it would be a mistake to take this as evidence that linguistic diversity caused traffic accidents.
In the paper we suggest that correlation studies should demonstrate at least two things:
- That the hypothesised correlation is stronger than correlations between similar cultural features that are not expected to be linked.
- That the hypothesised correlation is robust against controlling for cultural descent.
We discuss some methods for achieving this, and demonstrate that they can debunk the spurious correlations that we discover in the first section. Many of these methods are straightforward and can be done quickly, so there's no excuse for avoiding them.
As well as careful statistical controls, correlation studies can also be assessed based on whether they are motivated by prior theory or not. For example, Lupyan & Dale's (2010) demonstration of a correlation between population size and morphological complexity was motivated by a long line of research on languages in contact. However, both kinds of discovery can be useful if they are seen in the context of a wider scientific method. We argue that correlation studies should be viewed as explorations of data, and as a sort of feasibility study for further, experimental, research. For example, the chance discovery of a link between genes and tone by Dediu & Ladd was not only statistically well controlled, but was used as the inspiration for more detailed laboratory experiments, rather than being seen as proof in itself.
Coming across statistical patterns by chance has always been part of the scientific process. However, with culture, it's much more difficult to intuitively distinguish real patterns from noise or historical influence. Correlations between unexpected features will continue to be exciting, but researchers should apply the right controls and see the studies as motivational rather than direct tests of hypotheses.
Roberts, S. & Winters, J. (2013). Linguistic Diversity and Traffic Accidents: Lessons from Statistical Studies of Cultural Traits. PLOS ONE, 8 (8) e70902 : doi:10.1371/journal.pone.0070902