Categorising languages through network modularity

Today I’ve been learning more about network structure (from Cris Moore) and I’ve applied my poor understanding and overconfidence to find language families from etymology data!

Here’s what I understand so far (see Clauset, Moore, &  Newman, 2008):  The modularity of a network is a measure of how many ‘communities’ it has.  An optimal modularity will split the graph to maximise the average degree within modules or clusters.  You can search all the possible clusterings to find this optimum.  I’m still hazy on how this is actually done, and you can extend this to find hierarchies like phylogenetics, but without some assumptions.  Luckily, there’s a network analysis program called gephi that does this automatically!

Continue reading “Categorising languages through network modularity”

The end of universals?

Woah, I just read some of the responses to Dunn et al. (2011) “Evolved structure of language shows lineage-specific trends in word-order universals” (language log here, Replicated Typo coverage here).  It’s come in for a lot of flack.  One concern raised at the LEC was that, considering an extreme interpretation, there may be no affect of universal biases on language structure.  This goes against Generativist approaches, but also the Evolutionary approach adopted by LEC-types.  For instance, Kirby, Dowman & Griffiths (2007) suggest that there are weak universal biases which are amplified by culture.  But there should be some trace of universality none the less.

Below is the relationship diagram for Indo-European and Uto-Aztecan feature dependencies from Dunn et al..  Bolder lines indicate stronger dependencies.  They appear to have different dependencies- only one is shared (Genitive-Noun and Object-Verb).

However, I looked at the median Bayes Factors for each of the possible dependencies (available in the supplementary materials).  These are the raw numbers that the above diagrams are based on.  If the dependencies’ strength rank in roughly the same order, they will have a high Spearman rank correlation.

Spearman Rank Correlation Indo-European Austronesian
Uto-Aztecan 0.39, p = 0.04 0.25, p = 0.19
Indo-European -0.13, p = 0.49

Spearman rank correlation coefficients and p-values for Bayes Factors for different dependency pairs in different language families.  Bantu was excluded because of missing feature data.

Although the Indo-European and Uto-Aztecan families have different strong dependencies, have similar rankings of those dependencies.  That is, two features with a weak dependency in an Indo-European language tend to have a weak dependency in Uto-Aztecan language, and the same is true of strong dependencies.  The same is true to some degree for Uto-Aztecan and Austronesian languages.  This might suggest that there are, in fact, universal weak biases lurking beneath the surface. Lucky for us.

However, this does not hold between Indo-European and Austronesian language families.  Actually, I have no idea whether a simple correlation between Bayes Factors makes any sense after hundreds of computer hours of advanced phylogenetic statistics, but the differences may be less striking than the diagram suggests.

UPDATE:

As Simon Greenhill points out below, the statistics are not at all conclusive.  However, I’m adding the graphs for all Bayes Factors (these are made directly from the Bayes Factors in the Supplementary Material):

Austronesian:                                                             Bantu:

Indo-European:                                                            Uto-Aztecan:

Michael Dunn,, Simon J. Greenhill,, Stephen C. Levinson, & & Russell D. Gray (2011). Evolved structure of language shows lineage-specific trends in word-order universals Nature, 473, 79-82

Anthropologists Trace Human Origins Back To One Large Goat

GoatsIn what is sure to be a more cited paper than Gould and Lewontin (1979), Douglas Ochs at Columbia University, together with a team of internationally renowned scientists (and probably a few internationally unknown graduate students), has found that all of humanity can be traced back to a large Pliocene-era goat.

More interesting, for this blog at least, is the finding that the roots of early Indo-European language were in goat bleating. Unfortunately, I couldn’t track down the actual paper myself to find the details of this argument, but if you’re interested, I would suggest looking at the original article where I found this wonderful and groundbreaking study, on the popular peer-reviewed site the Onion.

Full disclosure: This post has been listed in the Irrelevant and Irreverent category, because it probably fits there. We’re not seriously suggesting that humans do in fact go back to a single large goat species in the Pliocene – that’s much too early. Rather, it’s more likely that the goat species was around in the Silurian period. It feasted mainly on trilobites.

Language, Thought, and Space (II): Universals and Variation

Spatial orientation is crucial when we try to navigate the world around us. It is a fundamental domain of human experience and depends on a wide array of cognitive capacities and integrated neural subsystems. What is most important for spatial cognition however, are the frames of references we use to locate and classify ourselves, others, objects, and events.

Often, we define a landmark (say ourselves, or a tree, or the telly) and then define an object’s location in relation to this landmark (the mouse is to my right, the bike lies left of the tree, my keys have fallen behind the telly). But as it turns out, many languages are not able to express a coordinate system with the meaning of the English expression “left of.” Instead, they employ a compass-like system of orientation.

They do not use a relative frame of reference, like in the English “the cat is behind the truck” but instead use an absolute frame of reference that can be illustrated in English by sentences such as “the cat is north of the truck.” (Levinson 2003: 3). This may seem exotic for us, but for many languages it is the dominant – although often not the only – way of locating things in space.

What cognitive consequences follow from this?

Continue reading “Language, Thought, and Space (II): Universals and Variation”

Words as alleles: A null-model for language evolution?

ResearchBlogging.orgFor me, recent computational accounts of language evolution provide a compelling rationale that cultural, as opposed to biological, evolution is fundamental in understanding the design features of language. The basis for this rests on the simple notion of language being not only a conveyor of cultural information, but also a socially learned and culturally transmitted system: that is, an individual’s linguistic knowledge is the result of observing the linguistic behaviour of others. Here, this well-attested process of language acquisition, often termed Iterated Learning, emphasises the effects of differential learnability on competing linguistic variants. Sounds, words and grammatical structures are therefore seen to be the products of selection and directed mutation. As you can see from the use of terms such as selection and mutation it’s clear we can draw many parallels between the literature on language evolution and analogous processes in biology. Indeed, Darwin himself noted such similarities in the Descent of Man. However, one aspect evolutionary linguists don’t seem to borrow is that of a null model. Is it possible that the changes we see in languages over time are just the products of processes analogous to genetic drift?

Continue reading “Words as alleles: A null-model for language evolution?”

Can linguistic features reveal time depths as deep as 50,000 years ago?

ResearchBlogging.orgThroughout much of our history language was transitory, existing only briefly within its speech community. The invention of writing systems heralded a way of recording some of its recent history, but for the most part linguists lack the stone tools archaeologists use to explore the early history of ancient technological industries. The question of how far back we can trace the history of languages is therefore an immensely important, and highly difficult, one to answer. However, it’s not impossible. Like biologists, who use highly conserved genes to probe the deepest branches on the tree of life, some linguists argue that highly stable linguistic features hold the promise of tracing ancestral relations between the world’s languages.

Previous attempts using cognates to infer the relatedness between languages are generally limited to predictions within the last 6000-10,000 years. In the present study, Greenhill et al (2010) decided to examine more stable linguistic features than the lexicon, arguing:

Continue reading “Can linguistic features reveal time depths as deep as 50,000 years ago?”

Recent Abstracts #1

In an effort to update this blog regularly, I’ve decided to take the lazy route and post up a list of abstracts. This will only happen once a week, but it’s a useful resource (for me at least), and will usually be an indicator of what articles I’m going to write about in the near future.

Continue reading “Recent Abstracts #1”

Current Issues in Language Evolution

As part of my assessment this term I’m to write four mock peer-reviewed items for a module called Current Issues in Language Evolution. It’s a great module run by Simon Kirby, examining some of the best food for thought in the field. Alone this is an interesting endeavour, after all we’re right in the middle of a language evolution renaissance, however, even cooler are the lectures, where students get to do their own presentations on a particular paper. I already did my presentation at the start of this term, on Dediu and Ladd’s paper, which went rather well, even if one of my slip ups did not go unnoticed (hint: always label the graphs). So, over the next few weeks, in amongst additional posts covering some of the presentations in class, I’ll hopefully be writing articles on these four five papers:

Continue reading “Current Issues in Language Evolution”