EvoLang proceedings are now online

This year, the proceedings of the Evolution of Language conference will appear online.  The first group of papers are already up:

Browse the EvoLang Electronic Proceedings

The move to self-publishing is a bit of an experiment, but hopefully it’ll mean that the papers are more accessible to a wider audience.  To aid this, the papers are published under Creative Commons licenses.  Some papers also include supplementary materials.

The full list of papers will be updated as revisions come in, but here are some interesting papers available so far:

Continue reading “EvoLang proceedings are now online”

How spurious correlations arise from inheritance and borrowing (with pictures)

James and I have written about Galton’s problem in large datasets.  Because two modern languages can have a common ancestor, the traits that they exhibit aren’t independent observations.  This can lead to spurious correlations: patterns in the data that are statistical artefacts rather than indications of causal links between traits.

However, I’ve often felt like we haven’t articulated the general concept very well.  For an upcoming paper, we created some diagrams that try to present the problem in its simplest form.

Spurious correlations can be caused by cultural inheritance 


Above is an illustration of how cultural inheritance can lead to spurious correlations.  At the top are three independent historical cultures, each of which has a bundle of various traits which are represented as coloured shapes.  Each trait is causally independent of the others.  On the right is a contingency table for the colours of triangles and squares.  There is no particular relationship between the colour of triangles and the colour of squares.  However, over time these cultures split into new cultures.  Along the bottom of the graph are the currently observable cultures.  We now see a pattern has emerged in the raw numbers (pink triangles occur with orange squares, and blue triangles occur with red squares).  The mechanism that brought about this pattern is simply that the traits are inherited together, with some combinations replicating more often than others: there is no causal mechanism whereby pink triangles are more likely to cause orange squares.

Spurious correlations can be caused by borrowing


Above is an illustration of how borrowing (or areal effects or horizontal cultural inheritance) can lead to spurious correlations.  Three cultures (left to right) evolve over time (top to bottom).  Each culture has a bundle of various traits which are represented as coloured shapes.  Each trait is causally independent of the others.  On the right is a count of the number of cultures with both blue triangles and red squares.  In the top generation, only one out of three cultures have both.  Over some period of time, the blue triangle is borrowed from the culture on the left to the culture in the middle, and then from the culture in the middle to the culture on the right.  By the end, all languages have blue triangles and red squares.  The mechanism that brought about this pattern is simply that one trait spread through the population: there is no causal mechanism whereby blue triangles are more likely to cause red squares.

A similar effect would be caused by a bundle of causally unrelated features being borrowed, as shown below.


Language Evolution 101: Gene’s Eye vs. DST

Broad hypothese are better than narrow ones as they can be applied to a wider range of things. That’s probably a controversial thing to say, but it’s certainly true that the beauty of most evolutionary theory lies in its simplicity, and therefore its ability to be applied to more than just biology. So how do different evolutionary theories fair when applied to the world of language? I’ll look here at the gene’s eye view of evolution and developmental systems theory.

The gene’s eye view of evolution

The gene’s eye view of evolution splits evolution up into the two processes, replication and interaction. The replicators are the things which are copied (generally genes) and the interactors are the organisms which interact with their environment. In this post I will be sticking with the terms ‘replicator’ and ‘interactor’ as posited by Hull (1980) as opposed to Dawkins’ ‘replicator’ and ‘vehicle’ as Hull’s terms are much more applicable to language as Hull formalised it as a generalised theory which Hull himself has applied to cultural evolution (Hull 1988).

Maynard-Smith and Szathmáry (1995) argue that since language and the genome are recursive then only these two mechanisms have an infinite number of heritable states which is why a replicator view of natural selection can only account for these two mechanisms. Many Linguists have tried to apply a replicator view to the evolution of language, both with regards to language’s biological and cultural evolution. Regarding the cultural evolution of language, there seems to be many parallels with biological evolution which can be drawn with the controversy as to what can be considered a replicator. David Hull (1980) defines a replicator as “an entity that passes on its structure directly in replication”. Within language this could qualify anything which allows us to say the same thing in a different way. This means that replicators can lie at a phonemic level, in that vowels can vary and some realisations will be more successful than others with regards to contrastive difference from other vowels. Morphemes can also vary and be more selectively successful in terms of productivity. Selection can work all the way up to lexemes and syntax, both on a wide scale, or on a narrow scale, with a specific idiosyncratic structure emerging in some frequently used phrases. If one of two interlocutors in a communicative act uses an idiosyncratic structure to express something, and is successful in being understood, then they will see little point in changing the utterance next time they want to express that proposition, this, presumably, would ‘catch on’. Croft (2000) lumps all of these possible replicators under a general heading of ‘lingueme’ to make them more analogous with genes. This may be an oversimplification, as layers of structure as they appear in language are not present in the DNA sequence (or at least not understood to the same level as they are in language) past the distinction of nucleotides, codons and ‘genes’, and even upon this distinction it is usually argued that single nucleotides and codons cannot be replicators, whereas, it seems that the smallest particles of language structure can be.

Croft (2000) argues that the selection of linguistic replicators is driven by social factors as he claims that speakers select variants with regards to their social values. However, as in biology, selection where not only functional selection, but sexual selection and social selection, also exist, it seems odd that language evolution would not also be driven by a combination of factors, both functional and social.

Language does not pass purely from vertical transmission from one generation to the next, as genes do, horizontal transmission is also present and there is linguistic input from more than just the two parents of an individual. Horizontal gene transfer, which occurs when an organism acquires genetic material from a different organism, but not through the process of replication or reproduction, could be described as analogous to this but this certainly isn’t the norm within genetic evolution as it is in the transmission of language (Pagel, 2009).

Developmental Systems Theory

Developmental Systems Theory (DST) is an approach to evolution in opposition to replicator/interactor view of natural selection. It takes the position that more things need to be taken into account than just replicators and interactors and that if anything is the unit of selection then it is the entire developmental system an organism takes. This stresses the importance of non-genetic factors and their role in evolution. Many layers of structure need to be considered and each of these layers of structure can only be accounted for in their own terms. A DST approach to the emergence of language is one which takes the whole developmental cycle of language acquisition and communication into account. The learning biases of children certainly counts as a unique event which is responsible for individual differences in each generation. As well as this, the learning biases of adults can also contribute to language evolution from a DST approach in societies where there are many second language speakers (Wray and Grace 2005). Learning biases in transmission are often cited exclusively in the context of cultural evolution; however, learning biases have now come to give us a good explanation as to how linguistic constraints may have become genetically assimilated after cultural transmission occurred though mechanisms such as the Baldwin Effect (Baldwin, 1896). If there’s any call for it I’ll post a 101 on the Baldwin Effect in the near future.


Baldwin, M. J. (1896) A New Factor in Evolution. The American Naturalist,  Vol. 30, No. 354, 441-451.

Croft, W. (2000) Explaining language change: an evolutionary approach.  Harlow: Pearson.

Hull, D. L., (1980). Individuality and  Selection. Annual Review of Ecology and Systematics, 11: 311–332.

Hull, D. L. (1988) Science as a process: an evolutionary account of the  social and conceptual development of science. Chicago: University of  Chicago Press.

Maynard-Smith, J. and Szathmáry, E. (1995) The major transitions in  evolution.

Pagel, M. (2009). Human language as a culturally transmitted replicator. Nature Reviews Genetics10(6), 405-415.

Pinker, S. and P. Bloom (1990). Natural Language and Natural Selection.  Behavioral and Brain Sciences 13.4: 707-726.

Wray, A. and Grace, G. (2005) The consequences of talking to strangers:  Evolutionary corollaries of socio-cultural influences on linguistic  form. Lingua, 117 (3), 543-578


Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution

In scientific prognostication we have a condition analogous to a fact of archery—the farther back you draw your longbow, the farther ahead you can shoot.
– Buckminster Fuller

The following remarks are rather speculative in nature, as many of my remarks tend to be. I’m sketching large conclusions on the basis of only a few anecdotes. But those conclusions aren’t really conclusions at all, not in the sense that they are based on arguments presented prior to them. I’ve been thinking about cultural evolution for years, and about the need to apply sophisticated statistical techniques to large bodies of text—really, all the texts we can get, in all languages—by way of investigating cultural evolution.

So it is no surprise that this post arrives at cultural evolution and concludes with remarks on how the human sciences will have to change their institutional ways to support that kind of research. Conceptually, I was there years ago. But now we have a younger generation of scholars who are going down this path, and it is by no means obvious that the profession is ready to support them. Sure, funding is there for “digital humanities” and so deans and department chairs can get funding and score points for successful hires. But you can’t build a profound and a new intellectual enterprise on financially-driven institutional gamesmanship alone.

You need a vision, and though I’d like to be proved wrong, I don’t see that vision, certainly not on the web. That’s why I’m writing this post. Consider it sequel to an article I published back in 1976 with my teacher and mentor, David Hays: Computational Linguistics and the Humanist. This post presupposes the conceptual framework of that vision, but does not restate nor endorse its specific recommendations (given in the form of a hypothetical program for simulating the “reading” of texts).

The world has changed since then and in ways neither Hays nor I anticipated. This post reflects those changes and takes as its starting point a recent web discussion about recovering the history of literary studies by using the largely statistical techniques of corpus linguistics in a kind of digital archaeology. But like Tristram Shandy, I approach that starting point indirectly, by way of a digression.

Who’s Kemp Malone?

Back in the ancient days when I was still an undergraduate, and we tied an onion in our belts as was the style at the time, I was at an English Department function at Johns Hopkins and someone pointed to an old man and said, in hushed tones, “that’s Kemp Malone.” Who is Kemp Malone, I thought? From his Wikipedia bio:

Born in an academic family, Kemp Malone graduated from Emory College as it then was in 1907, with the ambition of mastering all the languages that impinged upon the development of Middle English. He spent several years in Germany, Denmark and Iceland. When World War I broke out he served two years in the United States Army and was discharged with the rank of Captain.

Malone served as President of the Modern Language Association, and other philological associations … and was etymology editor of the American College Dictionary, 1947.

Who’d have thought the Modern Language Association was a philological association? Continue reading “Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution”

What Does It Mean To Mean?

I’ve been agonizing somewhat over what to write as my first post. I am currently delving into the wonderful word of pragmatics via a graduate seminar at the University of Virginia, but I do not yet feel proficient enough to comment on the complex philosophical theories that I am reading. So, I am going to briefly present an overview of what I will be attempting to accomplish in my year-and-a-half long thesis project. Upcoming entries will most likely be related to this topic, similar topics, and research done that bears on the outcome of my investigation.

I recently was watching a debate between Richard Dawkins and Rowan Williams, the Archbishop of Canterbury, on the nature of the human species and its origin. To no one’s surprise, language was brought up when discussing human origins.  Specifically, recursive, productive language as a distinguishing marker of the human species. What may seem obvious to the evolutionary linguists here actually came with some interesting problems, from a biological perspective. As Dawkins discusses in the debate, evolution is rather difficult for the animal kingdom. Whereas for plants, there may be distinct moments at which one can point and say “Here is when a new species emerged!”, this identifiable moment is less overt for animals.  One key problem with determining the exact moment of a new species’ emergence is the question of interbreeding.

If we consider the development of a language (a system of communication with the aforementioned characteristics) to be a marker of the human species, then do we suppose at one point we have a child emerging with the ability to form a language with mute or animalistic parents? To whom would the child speak? If Dawkins is correct and language is partially rooted in a specific gene, we could theorize that the “first” human with the gene would thereby mate with proto-humans lacking the gene. All of this is, of course, very sketchy and difficult to elucidate, as even the theory that language is rooted in a gene can be disputed. The problem remains an integral one, not only for understanding the evolutionary origins, but as the philosophers in my pragmatics class would point out, it would also have significant bearing on ontological and ethical questions regarding human origins.

I do not hope to solve this entire issue in my senior thesis; however, I do hope to show the development of language less as a suddenly produced trait and more as a gradual process from a less developed system of communication to a more developed one. From a pragmatics point of view, the question might be, how do we jump the gap, so to speak, between the lesser developed systems of communication (conventionally, these include animal communication, natural meaning, etc.) and the fully fledged unique system of human language? Paul Grice, as one might discover in my handy dandy Wikipedia link above, proposed a distinction between natural meaning, which he defined as being a cause/effect indication and considered in terms of its factivity, and non-natural meaning, as a communicative action that must be considered in terms of the speaker’s intentions. Yet, as stated above, the question remains: how do we (evolutionarily) progress from natural meaning to non-natural meaning?

Not to overly simplify, but my answer rests in the question of what it means to mean something. I hope to show, in my subsequent posts, that an investigation into semantics, and, more specifically, a natural progression through a hierarchy of types of meaning, might shed light on this problem. In short, taking a look at the development of meaning, intent, and the qualifications for a language proper can shed light on how language developed into the complex, unique phenomenon we study today.  (Oh, and to satisfy the philosophers in my class, I may ramble occasionally about the implications for a philosophical conception of our species!)


A history of evolution pt. 2: The Wealth of Nations, Populations and On the Origin

Title page of the original edition of Malthus' 1798 work

Continue reading “A history of evolution pt. 2: The Wealth of Nations, Populations and On the Origin”

Memetic Sophistry

Over at the Psychology Today blog complex, Joseph Carroll is taking Norman Holland to task on remarks that Holland made concerning the relationship between the reader of a literary text and the text itself. Though I disagree with Carroll on many matters, I agree with him on this one particular issue. Beyond that, I think his critique of Holland can also be applied to Susan Blackmore’s equivocations on memes. Here’s what Carroll says about Holland:

This whole way of thinking is a form of scholastic sophistry, useless and sterile. It produces verbal arguments that consist only in fabricated and unnecessary confusions, confusions like that which you produce as your conclusion in the passage you cited from your book: “the reader constructs everything” (p. 176). This conclusion seems plausible because it slyly blends two separate meanings of the word “constructs.” One meaning is that our brains assemble percepts into mental images. That meaning is correct. The other meaning is that our brains assemble percepts that are not radically constrained by the signals produced in the book. That meaning is incorrect. Once you have this kind of ambiguity at work for you, you can shuffle back and forth between the two meanings, sometimes suggesting the quite radical notion that books don’t “impose” any constraints—any meanings—on readers; and sometimes retreating into the safety of the correct meaning: that our brains assemble percepts.

Blackmore equivocates in a similar fashion on the question of whether or not memes are active agents. Here’s a snippet from a TED talk she gave last year:

The way to think about memes, though, is to think, why do they spread? They’re selfish information, they get copied if they can. But some of them will be copied because they’re good, or true, or useful, or beautiful. Some of them will be copied even though they’re not. Some, it’s quite hard to tell why.

Here she talks of memes as though they are agents of some kind, they’re selfish and they try to get copied. A bit later she says:

So think of it this way. Imagine a world full of brains and far more memes than can possibly find homes. The memes are trying to get copied, trying, in inverted commas, i.e., that’s the shorthand for, if they can get copied they will. They’re using you and me as their propagating copying machinery, and we are the meme machines.

Here memes are using us as machines for propagating themselves. And then we have this passage where she talks about a war between memes and genes:

So you get an arms race between the genes which are trying to get the humans to have small economical brains and not waste their time copying all this stuff, and the memes themselves, like the sounds that people made and copied – in other words, what turned out to be language – competing to get the brains to get bigger and bigger. So the big brain on this theory of driven by the memes.

The term “meme,” as we know, was coined by Richard Dawkins, who is also responsible for anthropomorphizing genes as selfish agents in biological evolution. Dawkins knows perfectly well that genes aren’t agents, and is quite capable of explicating that selfishness in terms that eliminate the anthropomorphism, which is but a useful shorthand, albeit a shorthand that has caused a great deal of mischief.

Continue reading “Memetic Sophistry”

Chomsky Chats About Language Evolution

If you go to this page at Linguistic Inquiry (house organ of the Chomsky school), you’ll find this blurb:

Episode 3: Samuel Jay Keyser, Editor-in-Chief of Linguistic Inquiry, has shared a campus with Noam Chomsky for some 40-odd years via MIT’s Department of Linguistics and Philosophy. The two colleagues recently sat down in Mr. Chomsky’s office to discuss ideas on language evolution and the human capacity for understanding the complexities of the universe. The unedited conversation was recorded on September 11, 2009.

I’ve neither listened to the podcast nor read the transcript—both linked available here. But who knows, maybe you will. FWIW, I was strongly influenced by Chomsky in my undergraduate years, but the lack of a semantic theory was troublesome. Yes, there was co-called generative semantics, but that didn’t look like semantics to me, it looked like syntax.

Then I found Syd Lamb’s stuff on stratificational grammar & that looked VERY interesting. Why? For one thing, the diagrams were intriguing. For another, Lamb used the same formal constructs for phonology, morphology, syntax and (what little) semantics (he had). That elegance appealed to me. Still does, & I’ve figured out how to package a very robust semantics into Lamb’s diagrammatic notation. But that’s another story.