The Great Mystery of the Vanishing Phonemes

It’s been well over a year since I first wrote about the relationship between phoneme inventory size and demography (see here and here). Since then, I have completed a thesis examining this relationship further, especially in the context of the relative roles of demography and tradeoffs between other linguistic subsystems (namely, a language’s lexicon and its morphological complexity). Outside my own bubble, the topic has exploded in popularity, culminating in the publication of Quentin Atkinson’s paper, Phonemic diversity supports a Serial Founder Effect Model of Language Expansion from Africa. It really hit home how big the topic was when I saw that the New York Times had picked up on the article. For me, this was a double-edged sword: obviously, I saw myself as the phoneme-guy over at Replicated Typo, so having someone else take this niche topic and make it popular dented my ego somewhat, but it was also a positive development in that the idea was now going to get the attention it deserved…

… Well, it sort of did and didn’t. Atkinson raised two major theoretical points in his paper. The first, and the one I’m interested in, made the link between phoneme inventory sizes, mechanisms of cultural transmission and the underlying demographic processes supporting these changes. Sadly, it was Atkinson’s second idea – that we could develop a serial founder effect model from Africa based on the phoneme inventory size – where most of the attention fell. In a methodological sense, I admired Atkinson’s approach to testing this second hypothesis, but I did feel he jumped the gun somewhat: I think more work was needed on the cultural transmission model before testing for serial founder effects. Indeed, that we haven’t developed an initial model linking the relationship between phoneme inventory size and demography, may yet prove to be Atkinson’s downfall: we should be testing multiple explanatory models (Bayesian MCMC comparison, perhaps?) rather than taking a one-size-fits-all approach.

Continue reading “The Great Mystery of the Vanishing Phonemes”

Reconstructing linguistic phylogenies – a tautology?

ResearchBlogging.org

So I thought I should begin my first post on here with a nice and gentle introductory sentence, but I realise that pointing out the increased use of computational phylogenetic tools on cultural and particularly linguistic data to the avid readers of this blog is probably a pretty pointless exercise.

There is of course a lot to say about parallels between biological and cultural evolution, and some of the work using computational tools has given us new insights into yet unanswered (and even hitherto unasked!) questions regarding language and language change. But today I’d like to share some thoughts on a particular “application” of phylogenetic tools, the methodology of which I find a bit odd, even though it is arguably the simplest evolutionary analogy of them all: using computational phylogenetics to reconstruct linguistic phylogenies.

Continue reading “Reconstructing linguistic phylogenies – a tautology?”

Language Evolution Session at EHBEA 2012

H/T: Evolutionary Linguistics.

Call deadline: 25 November 2011
Event Dates: 15-28 March 2012
Event Location: Durham, UK
Event URL:
http://www.dur.ac.uk/jeremy.kendal/EHBEA2012/Welcome.html
Dear colleagues,

We are organising a special themed session on language evolution at the 2012 Annual Meeting of the European Human Behaviour and Evolution Association, which is held in Durham, UK, 25th-28th March 2012 (http://www.dur.ac.uk/jeremy.kendal/EHBEA2012/Welcome.html). EHBEA is an excellent venue for interdisciplinary work on the cultural and biological evolution of human behaviour, including language. Given that EHBEA is running shortly after EVOLANG next year, we are happy for research that is targeted at EVOLANG to also be submitted here, although note that the audience for each is likely to be different.

If you would like to submit an abstract for consideration as part of this themed session, please follow the submission instructions on the EHBEA website, marking your abstract as for consideration in the language evolution special session, organised by Simon Kirby and Kenny Smith. Abstracts will be independently reviewed by the usual EHBEA reviewers, so bear that in mind when preparing your submission. The themed session will only run if sufficient abstracts are accepted – of course, papers on language evolution could be presented independently as standard EHBEA talks.

The deadline for submissions is November 25th.

PLEASE FORWARD THIS MESSAGE TO ANYONE WHO MIGHT BE INTERESTED!

Best wishes,
Simon & Kenny

Tea Leaves and Lingua Francas: Why the future is not easy to predict

We all take comfort in our ability to project into the future. Be it through arbitrary patterns in Spring Pouchong tea leaves, or making statistical inferences about the likelihood that it will rain tomorrow, our accumulation of knowledge about the future is based on continued attempts of attaining certainty: that is, we wish to know what tomorrow will bring. Yet the difference between benignly staring at tea leaves and using computer models to predict tomorrow’s weather is fairly apparent: the former relies on a completely spurious relationship between tea leaves and events in the future, whereas the latter utilises our knowledge of weather patterns and then applies this to abstract from currently available data into the future. Put simply: if there are dense grey clouds in the sky, then it is likely we’ll get rain. Conversely, if tea-leaves arrange themselves into the shape of a middle finger, it doesn’t mean you are going to be continually dicked over for the rest of your life. Although, as I’ll attempt to make clear below, these are differences in degrees, rather than absolutes.

So, how are we going to get from tea-leaves to Lingua Francas? Well, the other evening I found myself watching Dr Nicholas Ostler give a talk on his new book, The Last Lingua Franca: English until the Return to Babel. For those of you who aren’t familiar with Ostler, he’s a relatively well-known linguist, having written several successful books popularising socio-historical linguistics, and first came to my attention through Razib Kahn’s detailed review of Empires of the Word. Indeed, on the basis of Razib’s post, I was not surprised by the depth of knowledge expounded during the talk. On this note alone I’m probably going to buy the book, as the work certainly filters into my own interests of historical contact between languages and the subsequent consequences. However, as you can probably infer from the previous paragraph, there were some elements I was slightly-less impressed with — and it is here where we get into the murky realms between tea-leaves and knowledge-based inferences. But first, here is a quick summary of what I took away from the talk:

Continue reading “Tea Leaves and Lingua Francas: Why the future is not easy to predict”

Robustness, Evolvability, Degeneracy and stuff like that…

Much of the work I plan to do for this year involves integrating traditional and contemporary theories of language change within an evolutionary framework. In my previous post I introduced the concept of degeneracy, which, to briefly recap, refers to components that have a structure-to-function ratio of many-to-one, with a single degenerate structure being capable of performing distinct functions under different conditions (pluripotent). Whitcare (2010: 5) provides a case in point for biological systems: here, the adhesin gene family in A. Saccharomyces “ expresses proteins that typically play unique roles during development, yet can perform each other’s functions when expression levels are altered”.

But what about degeneracy in language? For a start, we already know from basic linguistic theory forms (i.e. structures) are paired with meanings (i.e. functions). More recent work has expanded upon this notion, especially in developing the concept of constructions (Goldberg, 2003): “direct form-meaning pairings that range from the very specific (words or idioms) to the more general (passive constructions, ditranstive construction), and from very small units (words with affixes, walked) to clause-level or even discourse-level units” (Beckner et al., 2009: 5). When applied to constructions, degeneracy fits squarely with work identifying language as a Complex Adaptive System (see here) and as a culturally transmitted replicator (see here and here), which offers a link between the generation of first order synchronic variation – in the form of innovation (e.g. newly introduced linguistic material in the form of sounds, words, grammatical constructions etc) – and the selection, propagation and fixation of linguistic variants within a speaker community.

For the following example, I’m going to look at a specific type of discourse-pragmatic feature, or construction, which has undergone renewed interest over the last thirty-years. Known as General Extenders (GEs) – utterance- or clause-final discourse particles, such as and stuff and or something – researchers are realising that, far from being superfluous linguistic baggage, these features “carry social meaning, perform indispensible functions in social interaction, and constitute essential elements of sentence grammar” (Pichler, 2010: 582). Of specific relevance, GEs, and discourse-pragmatic particles more generally, are multifunctional: that is, they are not confined to a single communicative domain, and can even come to serve multiple roles within the same communicative context or utterance.

It is proposed the concept of degeneracy will allow us to explain how multifunctional discourse markers emerge from variation existent at structural components of linguistic organisation, such as the phonological and morphosyntactic components. If anything, I hope the post might serve as some food for thought, as I’m still grappling with the applications of the theory (and whether there’s anything useful to say!).

Continue reading “Robustness, Evolvability, Degeneracy and stuff like that…”

New Blog: A Rare Bite of Linguistics

Being someone who likes to welcome new academics blogs on the scene, particularly ones of a linguistic tilt, I urge you to go over, visit, read and maybe even leave a comment at A Rare Bite of Linguistics. It’s only one-post old, but the subject topic of language change and grammaticalisation fits in nicely with this blog’s overarching themes. As some of you might know, I wrote a bit about grammaticalisation at the start of this year, so the work is especially useful to lay folk such as myself. The post is the first of two that report the author’s findings of her MA project, which focused on the grammatical status of certainly in collocation with modal verbs. In the author’s own words:

My hypothesis is that the adverb is not fully grammaticalised even though it might show signatures of grammaticalisation.

Following Noël (2007), Bybee (2003) and Hopper and Traugott (2003) grammaticalisation affects a construction primarily and a single word secondarily; I suggest that, for modal synergy, a structural unit is formed of a modal verb and an adjacent modal adverb in mid-position, e.g. would certainly, must certainly etc. Mid-position is the ‘natural habitat’ of the modal particle and if there is grammaticalisation of certainly into a modal particle, this is consequently where we would expect to find it. Moreover, if this were a grammatical unit/construction consisting of two grammatical constituents, the grammaticality would lie in the bondedness (syntagmatic restriction) of the two elements, and the semantic and paradigmatic restrictions which are said to be part of grammaticalisation (cf. Lehmann’s parameters): we would expect an abstract meaning and perhaps reduced phonological properties (which I cannot test), paradigmaticity, low paradigmatic variability and high cohesion with modal verbs in general. Scope is a contested parameter and it seems that in this case too, we will deal with increased scope. Lastly, as Bybee (2003) indicated, frequency plays a staple role in the propagation of an item to becoming grammaticalised (see also Croft 2000).

It’s at quite a high level, but she does provide good, comprehensive definitions of what she’s studying and, more importantly, a fleshed out understanding of grammaticalisation theory and the processes underpinning it.

Neural Language Networks at Birth

I haven’t had chance to read this paper, but it throws up some interesting discussion points relating to this blog. In particular, it relates to a hypothesis I put forward last year on Domain-General Regions and Domain-Specific Networks. Here is the abstract:

The ability to learn language is a human trait. In adults and children, brain imaging studies have shown that auditory language activates a bilateral frontotemporal network with a left hemispheric dominance. It is an open question whether these activations represent the complete neural basis for language present at birth. Here we demonstrate that in 2-d-old infants, the language-related neural substrate is fully active in both hemispheres with a preponderance in the right auditory cortex. Functional and structural connectivities within this neural network, however, are immature, with strong connectivities only between the two hemispheres, contrasting with the adult pattern of prevalent intrahemispheric connectivities. Thus, although the brain responds to spoken language already at birth, thereby providing a strong biological basis to acquire language, progressive maturation of intrahemispheric functional connectivity is yet to be established with language exposure as the brain develops.

Paper Link: http://www.pnas.org/content/108/38/16056.short?rss=1

 

Degeneracy, Evolution and Language

Having had several months off, I thought I’d kick things off by looking at a topic that’s garnered considerable interest in evolutionary theory, known as degeneracy. As a concept, degeneracy is a well known characteristic of biological systems, and is found in the genetic code (many different nucleotide sequences encode a polypeptide) and immune responses (populations of antibodies and other antigen-recognition molecules can take on multiple functions) among many others (cf. Edelman & Gally, 2001). More recently, degeneracy is appreciated as having applications in a wider range of phenomena, with Paul Mason (2010) offering the following value-free, scientific definition:

Degeneracy is observed in a system if there are components that are structurally different (nonisomorphic) and functionally similar (isofunctional) with respect to context.

A pressing concern in evolutionary research is how increasingly complex forms “are able to evolve without sacrificing robustness or the propensity for future beneficial adaptations” (Whitcare & Bender, 2010). One common solution is to refer to redundancy: duplicate elements that have a structure-to-function ratio of one-to-one (Mason, 2010). Nature does redundancy well, and is exemplified by the human body: we have two eyes, two lungs, two kidneys, and so on. Still, even with redundant components, selection in biological systems would result in a situation where competitive elimination leads to the eventual extinction of redundant variants (ibid).

Continue reading “Degeneracy, Evolution and Language”

Should Mother Tongue be Father Tongue?

A new paper, published in Science last week, has reviewed some of the correlations which suggest that language change may be subject to sex-specific transmission. This has been discovered through looking at Y-chromosome DNA types. Modern male DNA (Y-Chromosome) is found to be the DNA from the population who originally spoke the language which has survived, whereas modern female DNA is often not the DNA of the population which spoke the language which has survived.

This evidence has come from, among others, a study by Chaubey (2011) with evidence for the Indian subcontinent. Austroasiatic languages are spoken by tribes with a high proportion of immigrant Y-chromosome DNA from East Asia, but with a high percentage of local female (mitochondrial) DNA. This pattern was also true of the Tibeto-Burman language family in northeastern India.

Other studies found matching correlations in Africa and found that Niger-Congo languages correlate with Y-Chromosome types, but the female DNA, which correlated more with geography (Wood et al. (2005) and de Filippo et al. (2011)).

Sex-biased language change can also be seen in the expansion of the Malayo-Polynesians in New Guinea. New Guinea has populations of Malayo-Polynesian speakers and also populations of Melanesian speakers. Malayo-Polynesian female DNA is about the same in both Malayo-Polynesian speaking areas and Melanesian speaking areas. However, the Malayo-Polynesian Y-Chromosome is found way more in the Malayo-Polynesian speaking areas than the Melanesian speaking areas.

This pattern is also seen in Iceland where the female DNA is mainly British, but the Y-chromosome is mainly Scandinavian. This follows the pattern because the Icelandic language is also Scandinavian.

Forster and Renfrew (authors of the Science paper) show that these findings complement studies such as Stoneking and Delfin who found that in East Asia, it is women who move after marriage rather than men. This means that if a man and woman migrate to a populated area their female offspring will move to other villages when married but their male offspring will remain static meaning that their language will stay in the same place as their Y-Chromosomes.

Is this the only mechanism at work when correlations of sex-specific language change can be seen? Others have hypothesized things such as farming and trade might be a factor. Groups of emigrating agriculturalists may also contribute where men outnumber women and take wives from the local community they were moving to. Men are also biologically capable of passing on and spreading about much more of their DNA than women can. It may also be the case that it is the father’s language rather than the mother’s which will be dominant within a family but I think more research would have to be done on this.

Interestingly the opposite correlation to the ones seen above is seen in Greenland where both the language and female DNA is Eskimo but the Y-Chromosome DNA is European.

Statistics and Symbols in Mimicking the Mind

MIT recently held a symposium on the current status of AI, which apparently has seen precious little progress in recent decades. The discussion, it seems, ground down to a squabble over the prevalence of statistical techniques in AI and a call for a revival of work on the sorts of rule-governed models of symbolic processing that once dominated much of AI and its sibling, computational linguistics.

Briefly, from the early days in the 1950s up through the 1970s both disciplines used models built on carefully hand-crafted symbolic knowledge. The computational linguists built parsers and sentence generators and the AI folks modeled specific domains of knowledge (e.g. diagnosis in elected medical domains, naval ships, toy blocks). Initially these efforts worked like gang-busters. Not that they did much by Star Trek standards, but they actually did something and they did things never before done with computers. That’s exciting, and fun.

In time, alas, the excitement wore off and there was no more fun. Just systems that got too big and failed too often and they still didn’t do a whole heck of a lot.

Then, starting, I believe, in the 1980s, statistical models were developed that, yes, worked like gang-busters. And these models actually did practical tasks, like speech recognition and then machine translation. That was a blow to the symbolic methodology because these programs were “dumb.” They had no knowledge crafted into them, no rules of grammar, no semantics. Just routines the learned while gobbling up terabytes of example data. Thus, as Google’s Peter Norvig points out, machine translation is now dominated by statistical methods. No grammars and parsers carefully hand-crafted by linguists. No linguists needed.

What a bummer. For machine translation is THE prototype problem for computational linguistics. It’s the problem that set the field in motion and has been a constant arena for research and practical development. That’s where much of the handcrafted art was first tried, tested, and, in a measure, proved. For it to now be dominated by statistics . . . bummer.

So that’s where we are. And that’s what the symposium was chewing over.

Continue reading “Statistics and Symbols in Mimicking the Mind”