Review of correlational studies in linguistics

Articles from the first edition of the Annual Review of Linguistics are appearing online this week.  Bob Ladd, Dan Dediu and I wrote a review of correlations in linguistics.

We review a number of recent studies that have identified either correlations between different linguistic features (e.g., implicational universals) or correlations between linguistic features and nonlinguistic properties of speakers or their environment (e.g., effects of geography on vocabulary). We compare large-scale quantitative studies with more traditional theoretical and historical linguistic research and identify divergent assumptions and methods that have led linguists to be skeptical of correlational work. We also attempt to demystify statistical techniques and point out the importance of informed critiques of the validity of statistical approaches. Finally, we describe various methods used in recent correlational studies to deal with the fact that, because of contact and historical relatedness, individual languages in a sample rarely represent independent data points, and we show how these methods may allow us to explore linguistic prehistory to a greater time depth than is possible with orthodox comparative reconstruction.  Whether researchers are for or against these new techniques, understanding them is becoming increasingly necessary to interface with discussions in the field.

One of the most fun parts of putting the paper together was drawing this diagram (below) of all the links that we discuss.  It turns out that there are a lot of complicated links between linguistic and social variables!  I’m currently working on methods to disentangle this web.

We also include three appendices as supplementary materials.  First, a list of electronic databases relevant for cross-cultural statistical comparisons.  Secondly, a very brief introduction to statistical hypothesis testing, which could be useful for linguists who are not familiar with statistical approaches.  Thirdly, a discussion of robustness and validity in statistical approaches to linguistics.

Other reviews also look interesting, for example, Johansson on Language abilities of Neandertals, Fisher and Vernes on genetics and linguistics, de Vos on village sign languages and Kroll et al. on bilingualism.

Ladd, D. R., Roberts, S. G., and Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1(1). preview


Mind-Culture Coevolution: Major Transitions in the Development of Human Culture and Society

This is revised from the introduction to a website I put up in the old days of web 1.0, all in hand-coded HTML. Where I’ve since uploaded downloadable versions of the documents I’ve used those links in this revised introduction, but you’re welcome to access the online versions from the old introduction.

Mind and Culture

A central phenomenon of the human presence on earth is that, over the long term, we have gained ever more capacity to understand and manipulate the physical world and, though some would debate this, the human worlds of psyche and society. The major purpose of the theory which the late David Hays and I have developed (and which I continue to develop) is to understand the mental structures and processes underlying that increased capacity. While more conventional students of history and of cultural evolution have much to say about what happened and when and what was influenced by what else, few have much to say about the conceptual and affective mechanisms in which these increased capacities are embedded. That is the story we have been endeavoring to tell.

Our theory is thus about processes in the human mind. Those processes evolve in tandem with culture. They require culture for their support while they enable culture through their capacities. In particular, we believe that the genetic elements of culture are to be found in the external world, in the properties of artifacts and behaviors, not inside human heads. Hays first articulated this idea in his book on the evolution of technology and I have developed it in my papers Culture as an Evolutionary Arena, Culture’s Evolutionary Landscape, in my book on music, Beethoven’s Anvil: Music in Mind and Culture, and in various posts at New Savanna and one for the National Humanities Center which I have aggregated into three working papers:

This puts our work at odds with some students of cultural evolution, especially those who identify with memetics, who tend to think of culture’s genetic elements as residing in nervous systems.

We have aspired to a system of thought in which the mechanisms of mind and feeling have discernible form and specificity rather than being the airy nothings of philosophical wish and theological hope. We would be happy to see computer simulations of the mechanisms we’ve been proposing. Unfortunately neither the computational art nor our thinking have been up to this task. But that, together with the neuropsychologist’s workbench, is the arena in which these matters must eventually find representation investigation, and a long way down the line, resolution. The point is that, however vague our ideas about mechanisms currently may be, it is our conviction that the phenomenon under investigation, culture and its implementation in the human brain, is not vague and formless, nor is it, any more, beyond our ken.

Major Transitions

The story we tell is one of cultural paradigms existing at four levels of sophistication, which we call ranks. In the terminology of current evolutionary biology, these ranks represent major transitions in cultural life. Rank 1 paradigms emerged when the first humans appeared on the savannas of Africa speaking language as we currently know it. Those paradigms structured the lives of primitive which societies emerged perhaps 50,000 to 100,000 years ago. Around 5,000 to 10,000 years ago Rank 2 paradigms emerged in relatively large stable human societies with people subsisting on systematic agriculture, living in walled cities and reading written texts. Rank 3 paradigms first emerged in Europe during the Renaissance and gave European cultures the capacity to dominate, in a sense, to create, world history over the last 500 years. This century has begun to see the emergence of Rank 4 paradigms. Continue reading


Vyv Evans: The Human Meaning-Making Engine

If you read my last post here at Replicated Typo to the very end, you may remember that I promised to recommend a book and to return to one of the topics of this previous post. I won’t do this today, but I promise I will catch up on it in due time.

What I just did – promising something – is a nice example for one of the two functions of language which Vyvyan Evans from Bangor University distinguished in his talk on “The Human Meaning-Making Engine” yesterday at the UK Cognitive Linguistics Conference. More specifically, the act of promising is an example for the interactive function of language, which is of course closely intertwined with its symbolic function. Evans proposed two different sources for this two functions. The interactive function, he argued, arises from the human instinct for cooperation, whereas meaning arises from the interaction between the linguistic and the conceptual system. While language provides the “How” of meaning-making, the conceptual system provides the “What”. Evans used some vivid examples (e.g. this cartoon exemplifying nonverbal communication) to make clear that communication is not contingent on language. However, “language massively amplifies our communicative potential.” The linguistic system, he argued, has evolved as an executive control system for the conceptual system. While the latter is broadly comparable with that of other animals, especially great apes, the linguistic system is uniquely human. What makes it unique, however, is not the ability to refer to things in the world, which can arguably be found in other animals, as well. What is uniquely human, he argued, is the ability to symbolically refer in a sign-to-sign (word-to-word) direction rather than “just” in a sign-to-world (word-to-world) direction.  Evans illustrated this “word-to-word” direction with Hans-Jörg Schmid’s (e.g.  2000; see also here)  work on “shell nouns”, i.e. nouns “used in texts to refer to other passages of the text and to reify them and characterize them in certain ways.” For instance, the stuff I was talking about in the last paragraph would be an example of a shell noun.

According to Evans, the “word-to-word” direction is crucial for the emergence of e.g. lexical categories and syntax, i.e. the “closed-class” system of language. Grammaticalization studies indicate that the “open-class” system of human languages is evolutionarily older than the “closed-class” system, which is comprised of grammatical constructions (in the broadest sense). However, Evans also emphasized that there is a lot of meaning even in closed-class constructions, as e.g. Adele Goldberg’s work on argument structure constructions shows: We can make sense of a sentence like “Someone somethinged something to someone” although the open-class items are left unspecified.

Constructions, he argued, index or cue simulations, i.e. re-activations of body-based states stored in cortical and subcortical brain regions. He discussed this with the example of the cognitive model for Wales: We know that Wales is a geographical entity. Furthermore, we know that “there are lots of sheep, that the Welsh play Rugby, and that they dress in a funny way.” (Sorry, James. Sorry, Sean.) Oh, and “when you’re in Wales, you shouldn’t say, It’s really nice to be in England, because you will be lynched.”

On a more serious note, the cognitive models connected to closed-class constructions, e.g. simple past -ed or progressive -ing, are of course much more abstract but can also be assumed to arise from embodied simulations (cf. e.g. Bergen 2012). But in addition to the cognitive dimension, language of course also has a social and interactive dimension drawing on the apparently instinctive drive towards cooperative behaviour. Culture (or what Tomasello calls “collective intentionality”)  is contigent on this deep instinct which Levinson (2006) calls the “human interaction engine”. Evans’ “meaning-making engine” is the logical continuation of this idea.

Just like Evans’ theory of meaning (LCCM theory), his idea of the “meaning-making engine” is basically an attempt at integrating a broad variety of approaches into a coherent model. This might seem a bit eclectic at first, but it’s definitely not the worst thing to do, given that there is significant conceptual overlap between different theories which, however, tends to be blurred by terminological incongruities. Apart from Deacon’s (1997) “Symbolic Species” and Tomasello’s work on shared and joint intentionality, which he explicitly discussed, he draws on various ideas that play a key role in Cognitive Linguistics. For example, the distinction between open- and closed-class systems features prominently in Talmy’s (2000) Cognitive Semantics, as does the notion of the human conceptual system. The idea of meaning as conceptualization and embodied simulation of course goes back to the groundbreaking work of, among others, Lakoff (1987) and Langacker (1987, 1991), although empirical support for this hypothesis has been gathered only recently in the framework of experimental semantics (cf. Matlock & Winter forthc. – if you have an account at, you can read this paper here). All in all, then, Evans’ approach might prove an important further step towards integrating Cognitive Linguistics and language evolution research, as has been proposed by Michael and James in a variety of talks and papers (see e.g. here).

Needless to say, it’s impossible to judge from a necessarily fairly sketchy conference presentation if this model qualifies as an appropriate and comprehensive account of the emergence of meaning. But it definitely looks promising and I’m looking forward to Evans’ book-length treatment of the topics he touched upon in his talk. For now, we have to content ourselves with his abstract from the conference booklet:

In his landmark work, The Symbolic Species (1997), cognitive neurobiologist Terrence Deacon argues that human intelligence was achieved by our forebears crossing what he terms the “symbolic threshold”. Language, he argues, goes beyond the communicative systems of other species by moving from indexical reference – relations between vocalisations and objects/events in the world — to symbolic reference — the ability to develop relationships between words — paving the way for syntax. But something is still missing from this picture. In this talk, I argue that symbolic reference (in Deacon’s terms), was made possible by parametric knowledge: lexical units have a type of meaning, quite schematic in nature, that is independent of the objects/entities in the world that words refer to. I sketch this notion of parametric knowledge, with detailed examples. I also consider the interactional intelligence that must have arisen in ancestral humans, paving the way for parametric knowledge to arise. And, I also consider changes to the primate brain-plan that must have co-evolved with this new type of knowledge, enabling modern Homo sapiens to become so smart.



Bergen, Benjamin K. (2012): Louder than Words. The New Science of How the Mind Makes Meaning. New York: Basic Books.

Deacon, Terrence W. (1997): The Symbolic Species. The Co-Evolution of Language and the Brain. New York, London: Norton.

Lakoff, George (1987): Women, Fire, and Dangerous Things. What Categories Reveal about the Mind. Chicago: The University of Chicago Press.

Langacker, Ronald W. (1987): Foundations of Cognitive Grammar. Vol. 1. Theoretical Prerequisites. Stanford: Stanford University Press.

Langacker, Ronald W. (1991): Foundations of Cognitive Grammar. Vol. 2. Descriptive Application. Stanford: Stanford University Press.

Levinson, Stephen C. (2006): On the Human “Interaction Engine”. In: Enfield, Nick J.; Levinson, Stephen C. (eds.): Roots of Human Sociality. Culture, Cognition and Interaction. Oxford: Berg, 39–69.

Matlock, Teenie & Winter, Bodo (forthc): Experimental Semantics. In: Heine, Bernd; Narrog, Heiko (eds.): The Oxford Handbook of Linguistic Analysis. 2nd ed. Oxford: Oxford University Press.

Schmid, Hans-Jörg (2000): English Abstract Nouns as Conceptual Shells. From Corpus to Cognition. Berlin, New York: De Gruyter (Topics in English Linguistics, 34).

Talmy, Leonard (2000): Toward a Cognitive Semantics. 2 vol. Cambridge, Mass: MIT Press.


Adaptive languages: Population structure and lexical diversity

A new paper by Bentz et al. is available for preview here. It is about a correlation between the lexical diversity of languages and the presence of non-native speakers in a population. This is particularly relevant to the work by Lupyan & Dale (2010), who found that morphological complexity within a language correlates with the population size of a language. It’s reasonable to expect that the percentage of second language speakers within a population will be affected by the size of a speaker population. There has been a lot of talk on this blog in the past about correlations between population structure and linguistic structure. There’s a pretty comprehensive page here covering some of the (spurious) correlations covered on the blog in the past.  Bentz. et al. are however aware of the criticisms raised by Sean and James in their Plos one paper, and are all for a pluralistic approach and state that “there needs to be independent evidence for a causal relationship” before covering qualitative and quantitative evidence from other areas.

Here is the abstract for the interested:

Explaining the diversity of languages across the world is one of the central aims of historical  and evolutionary linguistics. This paper presents a quantitative approach to measure and  model a central aspect of this variation, namely the lexical diversity of languages. Lexical  diversity is defined as the breadth of word forms used to encode constant information content.  It is measured by means of comparing word frequency distributions for parallel translations of hundreds of languages. The measure is based on indices used in studies of biodiversity and in quantitative linguistics, i.e. Zipf-Mandelbrot’s law, Shannon entropy and type-token ratios. Three statistical models are given to elicit potential factors driving languages towards less diverse lexica. It is shown that the ratio of non-native speakers in languages predicts lower lexical diversity. This suggests that theories focusing on native acquisition as driving force of language change are incomplete. Instead, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.


Language Evolution in the Infinite Monkey Cage

A couple of weeks ago there was an episode of the BBC’s Infinite Monkey Cage starring (as well as Robin Ince, Prof. Brian Cox and Ross Noble) none other than Keith Jensen and Katie Slocombe! Despite it being a comedy programme, the discussion around language is very sensible and informative and covers Slocombe’s work with chimpanzees as well as talk of Vervet monkeys, and Robin comes up with a not unreasonable experiment involving throwing leopards through the air to address some of the questions covered in the study in Diana Monkeys I cover here.

You can listen by going here:

And clicking on: “Are Humans Uniquely Unique?”

Slocombe has been doing great work in the field of science communication for years now. You can check some of her activities here:

Also, I’d recommend other episodes of TIMC.

SpecGram Essential Guide to Linguistics: electronic version

The Speculative Grammarian Essential Guide to Linguistics is now available in electronic format.  In the highest tradition of satire, this book gives a unique insight into the world of linguistics.  It is crucial reading for any linguist who is trying to maintain a sense of perspective (or for those seeking the comfort of realising their own perspective is relatively grounded ).

There’s also a special discount for the readers of Replicated Typo! Follow this link for 16.8% off.

From the blurb:

The book is written for linguists, by linguists. It’s about Linguistics and Language, but it’s not a textbook. Rather, it takes a sidelong look at all that is humorous about the field. Containing over 150 articles, poems, cartoons, humorous ads and book announcements—plus a generous sprinkling of quotes, proverbs and other witticisms—the book discovers things to laugh about in most major subfields of Linguistics.

What people have been saying:

“Don’t wait for Jon Stewart or Louis C.K. to do something with linguisticsit ain’t gonna happen. Just get this book and give a copy to everyone who needs a laugh.”

—Stephen Dodson, Languagehat

“[The Speculative Grammarian Essential Guide to Linguistics] will be a symbolic expression of your inner linguistic nerd.”

—Phaedra Royle, on Linguist List

“Complete with a choose-your-own-career-in-linguistics adventure game (German-sign-language-shaped dice not included), this is the ultimate gift for the budding language student, the jaded academic or the holistic forensic linguist.”

—Sean Roberts, A Replicated Typo

And just in time for Christmas.


Beyond Quantification: Digital Criticism and the Search for Patterns

I’ve collected some recent posts (from New Savanna) on patterns into a working paper. It’s online at SSRN. Here’s the abstract and the introduction.

Abstract: Literary critics seek patterns, whether patterns in individual texts or patterns in large collections of texts. Valid patterns are taken as indices of causal mechanisms of one sort or another. Most abstractly, a pattern emerges or is enacted as some machine makes its way in an environment. An ecological niche is a pattern “traced” by an organism in its environment. Literary texts are themselves patterns traced by writers (and readers) through their life worlds. Patterns are frequently described through visualizations. The concept of pattern thus dissolves the apparent conflict between quantification and meaning, for quantification is but a means to describing a pattern. It is up to the critic to determine whether or not a pattern is meaningful by identifying the mechanism that produced the pattern. Examples from Shakespeare and Joseph Conrad.

Introduction: Patterns and Descriptions There is a sense, of course, in which I’ve been aware of and have been perceiving and thinking about patterns all my life. They are ubiquitous after all. But it wasn’t until I began studying cognitive science with the late David Hays that “pattern” became a term of art. Hays and his students were developing a network model of cognitive structure – such works became common in the 1970s. Such networks admit of two general kinds of computational process, path tracing and pattern recognition. Path tracing is computationally easy, while the pattern recognition is not. Human beings, however, are very good at perceiving and recognizing patterns.

What put the idea before me, though, as something demanding specific thought, are remarks Franco Moretti made in coming to grips with his work on the network analysis of plot structure. In Network Theory, Plot Analysis (Literary Lab Pamphlet 2, 2011, p. 11) he noted that he “did not need network theory; but I probably needed networks…. What I took from network theory were less concepts than visualization.” We then examine the visualizations to determine whether or not they indicate patterns that are worth further exploration. Continue reading


Systematic reviews 101: Internal and External Validity

Who remembers last summer when I started writing a series of posts on systematic literature reviews?

I apologise for neglecting it for so long, but here is a quick write up on assessing the studies you are including in your review for internal and external validity, with special reference to experiments in artificial language learning and evolutionary linguistics (though this is relevant to any field which aspires to adopt scientific method).

In the first post in the series, I outlined the differences between narrative and systematic reviews. One of the defining features of a systematic review is that it is not written with a specific hypothesis in mind. The literature search (which my next post will be about) is conducted with predefined inclusion criteria and, as a result, you will end up with a pile of studies to review regardless of there conclusion, or indeed regardless of there quality. Due to a lack of a filter to catch bad science, we need methods to assess the quality of a study or experiment which is what this post will be about.

(This will also help with DESIGNING a valid experiment, as well as assessing the validity of other people’s.)

What is validity?

Validity is the extent to which a conclusion is a well-founded one given the design and analysis of an experiment. It comes in two different flavours: external validity and internal validity.

External Validity

External validity is the extent to which the results of an experiment or study can be extrapolated to different situations. This is EXTREMELY important in the case of experiments in evolutionary linguistics because the whole point of experiments in evolutionary linguistics is to extrapolate your results to different situations (i.e. the emergence of linguistic structure in our ancestors), and we don’t have access to our ancestors to experiment on.

Here are some of things that effect an experiment’s external validity (in linguistics/psychology):

  • Participant characteristics (age (especially important in language learning experiments), gender, etc.)
  • Sample size
  • Type of learning/training (important in artificial language learning experiments)
  • Characteristics of the input (e.g. the nature of the structure in an input language)
  • Modality of the artificial language (how similar to actual linguistic modalities?)
  • Modality of output measures (how the outcome was measured and analysed)
  • The task from which the output was produced (straightforward imitation or communication or some other task)

Internal Validity

Internal validity is how well an experiment reduces its own systematic error within the circumstances of the experiment being performed.

Here are some of things that effect an experiment’s internal validity:

  •  Selection bias (who’s doing the experiment and who gets put in which condition)
  • Performance bias (differences between conditions other than the ones of interest, e.g. running people in condition one in the morning and condition two in the afternoon)
  • Detection bias (how the outcomes measures are coded and interpreted, blinding which condition a participant is in before coding is paramount to reduce the researcher’s bias to want to find a difference between conditions. A lot of retractions lately have been down to failures to act against detection bias.)
  • Attrition bias (Ignoring drop-outs, especially if one condition is especially stressful, causing high drop-out rates and therefore bias in the participants who completed it. This probably isn’t a big problem in most evolutionary linguistics research, but may be in other psychological stuff.)

Different types of bias will be relevant to different fields of research and different research questions, so it may be an idea to come up with your own scoring method for validity to subject different studies to within your review. But remember to be explicit about what your scoring methods are, and the pros and cons of the studies you are writing about.

Hopefully this introduction will have helped you think about validity within experiments in what you’re interested in, and helped you take an objective view on assessing the quality of studies you are reviewing, or indeed conducting.


PhD Opportunities: The Wellsprings of Linguistic Diversity

PhD positions are available at ANU, working with a team of people investigating diversity and cultural evolution.  The call is below:

Applications are now being sought for three PhD positions on the project ‘The Wellsprings of Linguistics Diversity’, funded by the Australian Research Council for the period mid-2014 to mid-2019.

Each PhD position will undertake substantial fieldwork on variation in a particular speech community: Western Arnhem Land (Bininj Gun-wok and neighbouring areas), Vanuatu (Sa and adjoining languages, South Pentecost Island), and Samoa (Samoan). Support will include a four-year stipend ($29,844 p/a), generous fieldwork funding, and embedding of the doctoral research in the dynamic team setting of the project, as well as the newly established ARC Centre of Excellence for the Dynamics of Language.  Positions will start in early February 2015.

The project is led by Prof. Nick Evans and the project team including postdocs Dr Murray Garde, Dr Ruth Singer, and Dr Dineke Schokkin and doctoral scholar Eri Kashima (fieldworkers), postdoc Dr Mark Ellison (computational modelling), and consultants Profs. Miriam Meyerhoff and Catherine Travis (variationist sociolinguistics) and Emeritus Prof. Andy Pawley (Samoan).

The project’s goal is to understand the causes of why linguistic diversity evolves differentially in different parts of the world, through a combination of detailed sociolinguistic case-studies of small-scale speech communities in their anthropological setting, and computational modelling of how micro-variation engenders macro-variation over iterations of transmission. The three high-diversity field sites are western Arnhem Land (Bininj Gun-wok and neighbouring languages), Morehead district of Southern New Guinea (Nen, Nambu, Idi), and South Pentecost Island, Vanuatu (Sa and neighbouring languages).  Samoa (Samoan) supplies a low-diversity comparator to the Vanuatu, and controls from small speech communities in global languages (English and Spanish) will be obtained by other investigators on the project.

A fuller description of the project can be downloaded from

General information about the doctoral program in School of Culture, History and Language at the ANU College of Asia and the Pacific can be found at

Specific enquiries should be directed to Nick Evans ( and completed application dossiers sent to Completed applications should include the following information:
(a)    CV with educational qualifications, any publications and other relevant experience (e.g. fieldwork, relevant internships)
(b)    a two-page statement setting out your preferred field site or sites, what skills and personal attributes you will bring to the project, and what you see as the most interesting and challenging issues you will need to solve
(c)    if available, other materials supporting your case (e.g. relevant articles or other materials)

Deadline:  Aug 3rd 2014, midnight, AEST

Once awards are made, successful applicants will be notified and then guided through making a formal application for enrolment status through the regular ANU system.

