population – Replicated Typo

Cultural differences in lateral transmission: Phylogenetic trees are OK for Linguistics but not biology

An article in PLos ONE debunks the myth that hunter-gatherer societies borrow more words than agriculturalist societies. In doing so, it suggests that horizontal transmission is low enough for phylogenetic analyses to be a valid linguistic tool.

Lexicons from around 20% of the extant languages spoken by hunter-gatherer societies were coded for etymology (available in the supplementary material). The levels of borrowed words were compared with the languages of agriculturalist and urban societies taken from the World Loanword Database. The study focussed on three locations: Northern Australia, northwest Amazonia, and California and the Great Basin.

In opposition to some previous hypotheses, hunter-gatherer societies did not borrow significantly more words than agricultural societies in any of the regions studied.

The rates of borrowing were universally low, with most languages not borrowing more than 10% of their basic vocabulary. The mean rate for hunter-gatherer societies was 6.38% while the mean for 5.15%. This difference is actually significant overall, but not within particular regions. Therefore, the authors claim, “individual area variation is more important than any general tendencies of HG or AG languages”.

Interestingly, in some regions, mobility, population size and population density were significant factors. Mobile populations and low-density populations had significantly lower borrowing rates, while smaller populations borrowed proportionately more words. This may be in line with the theory of linguistic carrying capacity as discussed by Wintz (see here and here). The level of exogamy was a significant factor in Australia.

The study concludes that phylogenetic analyses are a valid form of linguistic analysis because the level of horizontal transmission is low. That is, languages are tree-like enough for phylogenetic assumptions to be valid:

“While it is important to identify the occasional aberrant cases of high borrowing, our results support the idea that lexical evolution is largely tree-like, and justify the continued application of quantitative phylogenetic methods to examine linguistic evolution at the level of the lexicon. As is the case with biological evolution, it will be important to test the fit of trees produced by these methods to the data used to reconstruct them. However, one advantage linguists have over biologists is that they can use the methods we have described to identify borrowed lexical items and remove them from the dataset. For this reason, it has been proposed that, in cases of short to medium time depth (e.g., hundreds to several thousand years), linguistic data are superior to genetic data for reconstructing human prehistory “

Excellent – linguistics beats biology for a change!

However, while the level of horizontal transmission might not be a problem in this analysis, there may be a problem in the paths of borrowing. If a language borrows relatively few words, but those words come from many different languages, and may have many paths through previous generations, there may be a subtle effect of horizontal transition that is being masked. The authors acknowledge that they did not address the direction of transmission in a quantitative way.

A while ago, I did study of English etymology trying to quantify the level of horizontal transmission through time (description here). The graph for English doesn’t look tree-like to me, perhaps the dynamics of borrowing works differently for languages with a high level of contact:

Claire Bowern, Patience Epps, Russell Gray, Jane Hill, Keith Hunley, Patrick McConvell, Jason Zentz (2011). Does Lateral Transmission Obscure Inheritance in Hunter-Gatherer Languages? PLoS ONE, 6 (9) : doi:10.1371/journal.pone.0025195

Does a Smart Phone make Smart Science?

A new paper in plos one, published today, has shown that experiments on human cognition needn’t be confined to the lab.

Experiments on human cognitive abilities, such as language, often rely on testing small and homogeneous groups of volunteers (mostly undergraduate students) coming to research facilities where they are asked to participate in behavioral experiments. This arrangement is not ideal as your sample will not be representative of the population as a whole and will also be restricted as there is only so many participants that money and time will allow you to get into the lab to be tested.

This new research by Dufau et al. shows that the sampling limitations which laboratory experiments produce can be overcome by using smartphones. Using smart phone technology, data can be collected for cognitive science experiments from thousands of subjects from all over the world.

To illustrate how this can be done the authors carried out a large-scale study using iPhone and iPads. This was a linguistic study looking at people’s ability to distinguish words from similar non-words.

The project, which began in December 2010 has managed to collect data from 4,157 subjects in just 4 months! This can be compared with the English Lexicon Project which acquired a similar volume of data using traditional methods which took more than 3 years.

The data was collected using applications which were produced in seven languages (English, Basque, Catalan, Dutch, French, Malay, Spanish). Smartphones can also support studies in alphabets other than Roman including Chinese, Greek, and Japanese. This creates the opportunity to create large-scale cross linguistic studies without even having to move from behind your desk.

Whilst the example here is linguistic there is every reason that smart phones can be implemented in looking at how universal other areas of cognitive behaviour are. Or even neurosceince and experimental philosophy. I wonder if it would be possible to carry out experiments using transmission chains using smart phones.

However, I do worry that using things like iPhones will have the same problems as using things like mechanical turk, as it means that experimenters will not be able to make sure that participants are carrying out the tasks properly and removes quite a lot of control. Smartphones are also still a luxury and therefore only people within a certain socio-economic class will have smartphones, so maybe these methods may not reach such a wide audience, which seems to be why they’re being proposed in the first place.

The authors of the paper are hailing smartphones “a potential revolution in cognitive science” but only time will tell if this really kicks off!

Reference

Stephane Dufau, Jon Andoni Dun abeitia, Carmen Moret-Tatay, Aileen McGonigal, David Peeters, F.-Xavier Alario, David A. Balota, Marc Brysbaert, Manuel Carreiras, Ludovic Ferrand, Maria Ktori, Manuel Perea, Kathy Rastle, Olivier Sasburg, Melvin J. Yap, J (2011). Smart Phone, Smart Science: How the Use of Smartphones Can Revolutionize Research in Cognitive Science PlosOne, 6 (9) : 10.1371/journal.pone.0024974

The USA: Most linguistically diverse country on the planet?

When asked to name a linguistically diverse place, I would have said Papua New Guinea, and if asked to name a stereotypically monolingual country, I would have named the USA. However, this recent report from the New York Times suggests that, due to its large immigrant population, New York harbours more endangered languages than anywhere else on Earth (tipped off from Edinburgh University’s Lang Soc Blog). From a field linguists’ point of view this may make discovery of and access to minority languages much easier (although may mean the end of exotic holidays). From a cultural evolution point of view, a more global community may mean a radically different kind of competition between languages. Nice video below:

Sonority and Sex: Why smaller communities are louder

Through this post on Sprogmuseet about Atkinson’s analysis of the out of Africa hypothesis, I found an article by Ember & Ember (2007) (who also quantified the link between colour lexicon size and distance from the equator, see my post here) on Sonority and climate. The article extends work by Fought et al. (2004) which finds that a language’s sonority is related to climate. Sonority is a measure of amplitude (loudness) as is greater for vowels than for consonants (for example, see here). Basically, the warmer the climate, the greater the sonority of the phoneme inventory of the population. The theory is that “people in warmer climates generally spend more time outdoors and communicate at a distance more often than people in colder climates”.

Continue reading “Sonority and Sex: Why smaller communities are louder”

Linguistic diversity and traffic accidents

I was thinking about Daniel Nettle’s model of linguistic diversity which showed that linguistic variation tends to decline even with a small amount of migration between communities. I wondered if statistics about population movement would correlate with linguistic diversity, as measured by the Greenberg Diversity Index (GDI) for a country (see below). However, this is a cautionary tale about obsession and use of statistics. (See bottom of post for link to data).

Continue reading “Linguistic diversity and traffic accidents”

Digital Humanities Sandbox Goes to the Congo

Or, Speculations in Computational Evolutionary Psychology

Note: This version of the post has been revised from an earlier version in which I suggested that the distribution in the first chart followed a power law. Cosma Shalizi checked it for me and it’s not a power law distribution. It’s an exponential distribution.

So, I’ve been exploring Conrad’s Heart of Darkness. In the last two posts I’ve examined one paragraph in the text, the so-called nexus. It’s the longest paragraph in the text, it’s structurally central, and it covers a lot of semantic territory.

OK, but what about the other paragraphs.

What about them?

Aren’t you going to look at them?

Well, yeah, but I sure don’t have time to troll through them like I did the nexus. I mean, that post stretched from here to Sunday.

I get your point. Why don’t you do the Moretti thing?

Moretti thing?

You know, distant reading.

Distant reading? You mean count something? Count what?

How about paragraph length?

What’ll that get me?

I don’t know. Just do it. I mean, you already know that the nexus is the longest paragraph in the text. There must be something going on with that. Mess around and see if something turns up.

* * * * *I did and it did.

I used the MSWord word-count tool to count the words in every paragraph in the text. All 198 of them. One at a time. Real tedious stuff. Then I loaded the results into a spreadsheet and created a bar chart showing paragraph length from longest to shortest:

Continue reading “Digital Humanities Sandbox Goes to the Congo”

A random walk model of linguistic complexity

EDIT: Since writing this post, I have discovered a major flaw with the conclusion which is described here.

One of the problems with large-scale statistical analyses of linguistic typologies is the temporal resolution of the data. Because we only typically have single measurements for populations, we can’t see the dynamics of the system. A correlation between two variables that exists now may be an accident of more complex dynamics. For instance, Lupyan & Dale (2010) find a statistically significant correlation between a linguistic population’s size and its morphological complexity. One hypothesis is that the language of larger populations are adapting to adult learners as they comes into contact with other languages. Hay & Bauer (2007) also link demography with phonemic diversity. However, it’s not clear how robust these relationships are over time, because of a lack of data on these variables in the past.

To test this, a benchmark is needed. One method is to use careful statistical controls, such as controlling for the area that the language is spoken in, the density of the population etc. However, these data also tend to be synchronic. Another method is to compare the results against the predictions of a simple model. Here, I propose a simple model based on a dynamic where cultural variants in small populations change more rapidly than those in large populations. This models the stochastic nature of small samples (see the introduction of Atkinson, 2011 for a brief review of this idea). This model tests whether chaotic dynamics lead to periods of apparent correlation between variables. Source code for this model is available at the bottom.

Continue reading “A random walk model of linguistic complexity”

Replicated Hauser Results

Some of you may remember last summer Marc Hauser was found guilty of research misconduct. This investigation raised questions about several publications including a paper from 2007 in Science. This paper looked into the ability of non-human primates to understand the intentions of a human experimenter by interpreting his gestures.

Today Science has published a partial replication of the study in question which confirms the original findings that chimpanzees, cotton-top tamarins, and rhesus macaques can distinguish intentional gestures, such as pointing to indicate a container with food inside, from “accidental” actions such as a hand flopping against a container.

The Science wesite states the following:

Following the Harvard misconduct investigation, first author Justin Wood, now an assistant professor at the University of Southern California in Los Angeles, wrote to Science in June 2010 to notify the journal that the investigation had revealed that the original field notes for the rhesus experiments could not be found:

“An internal examination at Harvard University determined that there are no field notes, records of aborted trials, or subject identifying information associated with the rhesus monkey experiments; however, the research notes and videotapes for the tamarin and chimpanzee experiments were accounted for. Professor Hauser states that “most of the rhesus monkey observations were hand written by [co-author David D.] Glynn on a piece of paper, and then the daily results tallied and reported to Wood over email or by phone” and then the raw data were discarded. The research assistant who performed the experiments (Glynn) confirmed that these field notes were discarded.”

Hauser and Wood returned to Cayo Santiago island in Puerto Rico to redo the experiments from the 2007 paper with the same population of free-ranging rhesus monkeys. Their findings, including field notes and video trials, are available online and they essentially match those reported in the original paper.

It is still not known what went wrong with the original experiment, a statement issued by Science today only says the following:

We stress that this new publication aims only to determine whether the original rhesus monkey experiments from the 2007 paper can be replicated. It has no bearing on questions raised about Dr. Hauser’s larger body of work.

This article from Science Inside quotes Dario Maestriperi as saying:

“The results of this replication are straightforward and entirely consistent with those of the original study. If the authors’ interpretation of their results is correct, these findings are very important and represent one of the clearest demonstrations that nonhuman primates can interpret the behavior of other individuals as intentional or non-intentional….Since the experimenter who tested the rhesus monkeys in the replication study appeared from the video to be the first author on the paper, Justin Wood, he was clearly knowledgeable of the hypotheses being tested and had some strong expectations and desires about the monkeys’ performance on the test.”

So is this replication a clarification of groundbreaking findings or could the monkey’s behaviour be down to the Clever Hans effect?

Meanwhile investigations into Hauser’s research are still ongoing and he is still banned from teaching for the next academic year.

The Return of the Phoneme Inventories

Right, I already referred to Atkinson’s paper in a previous post, and much of the work he’s presented is essentially part of a potential PhD project I’m hoping to do. Much of this stems back to last summer, where I mentioned how the phoneme inventory size correlates with certain demographic features, such as population size and population density. Using the the UPSID data I generated a generalised additive model to demonstrate how area and population size interact in determining the phoneme inventory size:

Interestingly, Atkinson seems to derive much of his thinking, at least in his choice of demographic variables, from work into the transmission of cultural artefacts (see here and here). For me, there are clear uses for these demographic models in testing hypotheses for linguistic transmission and change, as I see language as a cultural product. It appears Atkinson reached the same conclusion. Where we depart, however, is in our overall explanations of the data. My major problem with the claim is theoretical: he hasn’t ruled out other historical-evolutionary explanations for these patterns.

Before we get into the bulk of my criticism, I’ll provide a very brief overview of the paper.

Continue reading “The Return of the Phoneme Inventories”

Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa

Just read about an article on phoneme diversity via GNXP and Babel’s Dawn. Hopefully I’ll share some of my thoughts on the paper this weekend as it clearly ties in with work I’m currently doing (see here and here). Below is the abstract:

Human genetic and phenotypic diversity declines with distance from Africa, as predicted by a serial founder effect in which successive population bottlenecks during range expansion progressively reduce diversity, underpinning support for an African origin of modern humans. Recent work suggests that a similar founder effect may operate on human culture and language. here I show that the number of phonemes used in a global sample of 504 languages is also clinal and fits a serial founder-effect model of expansion from an inferred origin in Africa. This result, which is no explained by more recent demographic history, local language diversity, or statistical non-independence within language families, points to parallel mechanisms shaping genetic and linguistic diversity and supports an African origin of modern human languages.

Reference: Atkinson, Q.D (2011). Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa. Science 332, 346. DOI: 10.1126/science.1199295.

Update: I’ve given a lengthier response here.