How spurious correlations arise from inheritance and borrowing (with pictures)

James and I have written about Galton’s problem in large datasets.  Because two modern languages can have a common ancestor, the traits that they exhibit aren’t independent observations.  This can lead to spurious correlations: patterns in the data that are statistical artefacts rather than indications of causal links between traits.

However, I’ve often felt like we haven’t articulated the general concept very well.  For an upcoming paper, we created some diagrams that try to present the problem in its simplest form.

Spurious correlations can be caused by cultural inheritance 

Gproblem2

Above is an illustration of how cultural inheritance can lead to spurious correlations.  At the top are three independent historical cultures, each of which has a bundle of various traits which are represented as coloured shapes.  Each trait is causally independent of the others.  On the right is a contingency table for the colours of triangles and squares.  There is no particular relationship between the colour of triangles and the colour of squares.  However, over time these cultures split into new cultures.  Along the bottom of the graph are the currently observable cultures.  We now see a pattern has emerged in the raw numbers (pink triangles occur with orange squares, and blue triangles occur with red squares).  The mechanism that brought about this pattern is simply that the traits are inherited together, with some combinations replicating more often than others: there is no causal mechanism whereby pink triangles are more likely to cause orange squares.

Spurious correlations can be caused by borrowing

Gproblem_HorizontalB

Above is an illustration of how borrowing (or areal effects or horizontal cultural inheritance) can lead to spurious correlations.  Three cultures (left to right) evolve over time (top to bottom).  Each culture has a bundle of various traits which are represented as coloured shapes.  Each trait is causally independent of the others.  On the right is a count of the number of cultures with both blue triangles and red squares.  In the top generation, only one out of three cultures have both.  Over some period of time, the blue triangle is borrowed from the culture on the left to the culture in the middle, and then from the culture in the middle to the culture on the right.  By the end, all languages have blue triangles and red squares.  The mechanism that brought about this pattern is simply that one trait spread through the population: there is no causal mechanism whereby blue triangles are more likely to cause red squares.

A similar effect would be caused by a bundle of causally unrelated features being borrowed, as shown below.

Gproblem_Horizontal

Empty Constructions and the Meaning of “Meaning”

Textbooks are boring. In most cases, they consist of a rather tiring collection of more or less undisputed facts, and they omit the really interesting stuff such as controversial discussions or problematic cases that pose a serious challenge to a specific scientific theory. However, Martin Hilpert’s “Construction Grammar and its Application to English” is an admirable exception since it discusses various potential problems for Construction Grammar at length. What I found particularly interesting was the problem of “meaningless constructions”. In what follows, I will present some examples for such constructions and discuss what they might tell us about the nature of linguistic constructions. First, however, I will outline some basic assumptions of Construction Grammar. Continue reading “Empty Constructions and the Meaning of “Meaning””

John Lawler on Generative Grammar

From a Facebook conversation with Dan Everett (about slide rules, aka slipsticks, no less) and others:

The constant revision and consequent redefining and renaming of concepts – some imaginary and some very obvious – has led to a multi-dimensional spectrum of heresy in generative grammar, so complex that one practically needs chromatography to distinguish variants. Babel comes to mind, and also Windows™ versions. Most of the literature is incomprehensible in consequence – or simply repetitive, except it’s too much trouble to tell which.

–John Lawler

What’s a Language? Evidence from the Brain

Yesterday I put up a post (A Note on Memes and Historical Linguistics) in which I argued that, when historical linguists chart relationships between things they call “languages”, what they’re actually charting is mostly relationships among phonological systems. Though they talk about languages, as we ordinarily use the term, that’s not what they actually look at. In particular, they ignore horizontal transfer of words and concepts between languages.

Consider the English language, which is classified as a Germanic language. As such, it is different from French, which is a Romance language, though of course both Romance and Germanic languages are Indo-European. However, in the 11th Century CE the Norman French invaded Britain and they stuck around, profoundly influencing language and culture in Britain, especially the part that’s come to be known as England. Because of their focus on phonology, historical linguists don’t register this event and its consequences. The considerable French influence on English simply doesn’t count because it affected the vocabulary, but not the phonology.

Well, the historical linguists aren’t the only ones who have a peculiar view of their subject matter. That kind of peculiar vision is widespread.

Let’s take a look at a passage from Sydney Lamb’s Pathways of the Brain (John Benjamins 1999). He begins by talking about Roman Jakobson, one of the great linguists of the previous century:

Born in Russia, he lived in Czechoslovakia and Sweden before coming to the United States, where he became a professor of Slavic Linguistics at Harvard. Using the term language in a way it is commonly used (but which gets in the way of a proper understanding of the situation), we could say that he spoke six languages quite fluently: Russian, Czech, German, English, Swedish, and French, and he had varying amounts of skill in a number of others. But each of them except Russian was spoken with a thick accent. It was said of him that, “He speaks six languages, all of them in Russian”. This phenomenon, quite common except in that most multilinguals don’t control as many ‘languages’, actually provides excellent evidence in support of the conclusion that from a cognitive point of view, the ‘language’ is not a unit at all.

Think about that. “Language” is a noun, nouns are said to represent persons, places, or things – as I recall from some classroom long ago and far away. Language isn’t a person or a place, so it must be a thing. And the generic thing, if it makes any sense at all to talk of such, is a self-contained ‘substance’ (to borrow a word from philosophy), demarcated from the rest of the world. It is, well, it’s a thing, like a ball, you can grab it in your metaphorical hand and turn it around as you inspect it. Continue reading “What’s a Language? Evidence from the Brain”

“Hierarchical structure is rarely…needed to explain how language is used in practice”

How hierarchical is language use?

Stefan L. Frank, Rens Bod and Morten H. Christiansen

Abstract: It is generally assumed that hierarchical phrase structure plays a central role in human language. However, considerations of simplicity and evolutionary continuity suggest that hierarchical structure should not be invoked too hastily. Indeed, recent neurophysiological, behavioural and computational studies show that sequential sentence structure has considerable explanatory power and that hierarchical processing is often not involved. In this paper, we review evidence from the recent literature supporting the hypothesis that sequential structure may be fundamental to the comprehension, production and acquisition of human language. Moreover, we provide a preliminary sketch outlining a non-hierarchical model of language use and discuss its implications and testable predictions. If linguistic phenomena can be explained by sequential rather than hierarchical structure, this will have considerable impact in a wide range of fields, such as linguistics, ethology, cognitive neuroscience, psychology and computer science.

Published online before print September 12, 2012, doi: 10.1098/rspb.2012.1741
Proceedings of the Royal Society B

Full text online HERE.

What Does It Mean To Mean?

I’ve been agonizing somewhat over what to write as my first post. I am currently delving into the wonderful word of pragmatics via a graduate seminar at the University of Virginia, but I do not yet feel proficient enough to comment on the complex philosophical theories that I am reading. So, I am going to briefly present an overview of what I will be attempting to accomplish in my year-and-a-half long thesis project. Upcoming entries will most likely be related to this topic, similar topics, and research done that bears on the outcome of my investigation.

I recently was watching a debate between Richard Dawkins and Rowan Williams, the Archbishop of Canterbury, on the nature of the human species and its origin. To no one’s surprise, language was brought up when discussing human origins.  Specifically, recursive, productive language as a distinguishing marker of the human species. What may seem obvious to the evolutionary linguists here actually came with some interesting problems, from a biological perspective. As Dawkins discusses in the debate, evolution is rather difficult for the animal kingdom. Whereas for plants, there may be distinct moments at which one can point and say “Here is when a new species emerged!”, this identifiable moment is less overt for animals.  One key problem with determining the exact moment of a new species’ emergence is the question of interbreeding.

If we consider the development of a language (a system of communication with the aforementioned characteristics) to be a marker of the human species, then do we suppose at one point we have a child emerging with the ability to form a language with mute or animalistic parents? To whom would the child speak? If Dawkins is correct and language is partially rooted in a specific gene, we could theorize that the “first” human with the gene would thereby mate with proto-humans lacking the gene. All of this is, of course, very sketchy and difficult to elucidate, as even the theory that language is rooted in a gene can be disputed. The problem remains an integral one, not only for understanding the evolutionary origins, but as the philosophers in my pragmatics class would point out, it would also have significant bearing on ontological and ethical questions regarding human origins.

I do not hope to solve this entire issue in my senior thesis; however, I do hope to show the development of language less as a suddenly produced trait and more as a gradual process from a less developed system of communication to a more developed one. From a pragmatics point of view, the question might be, how do we jump the gap, so to speak, between the lesser developed systems of communication (conventionally, these include animal communication, natural meaning, etc.) and the fully fledged unique system of human language? Paul Grice, as one might discover in my handy dandy Wikipedia link above, proposed a distinction between natural meaning, which he defined as being a cause/effect indication and considered in terms of its factivity, and non-natural meaning, as a communicative action that must be considered in terms of the speaker’s intentions. Yet, as stated above, the question remains: how do we (evolutionarily) progress from natural meaning to non-natural meaning?

Not to overly simplify, but my answer rests in the question of what it means to mean something. I hope to show, in my subsequent posts, that an investigation into semantics, and, more specifically, a natural progression through a hierarchy of types of meaning, might shed light on this problem. In short, taking a look at the development of meaning, intent, and the qualifications for a language proper can shed light on how language developed into the complex, unique phenomenon we study today.  (Oh, and to satisfy the philosophers in my class, I may ramble occasionally about the implications for a philosophical conception of our species!)

 

Cognitivism and the Critic 2: Symbol Processing

It has long been obvious to me that the so-called cognitive revolution is what happened when computation – both the idea and the digital technology – hit the human sciences. But I’ve seen little reflection of that in the literary cognitivism of the last decade and a half. And that, I fear, is a mistake.

Thus, when I set out to write a long programmatic essay, Literary Morphology: Nine Propositions in a Naturalist Theory of Form, I argued that we think of literary text as a computational form. I submitted the essay and found that both reviewers were puzzled about what I meant by computation. While publication was not conditioned on providing such satisfaction, I did make some efforts to satisfy them, though I’d be surprised if they were completely satisfied by those efforts.

That was a few years ago.

Ever since then I pondered the issue: how do I talk about computation to a literary audience? You see, some of my graduate training was in computational linguistics, so I find it natural to think about language processing as entailing computation. As literature is constituted by language it too must involve computation. But without some background in computational linguistics or artificial intelligence, I’m not sure the notion is much more than a buzzword that’s been trendy for the last few decades – and that’s an awful long time for being trendy.

I’ve already written one post specifically on this issue: Cognitivism for the Critic, in Four & a Parable, where I write abstracts of four texts which, taken together, give a good feel for the computational side of cognitive science. Here’s another crack at it, from a different angle: symbol processing.

Operations on Symbols

I take it that ordinary arithmetic is most people’s ‘default’ case for what computation is. Not only have we all learned it, it’s fundamental to our knowledge, like reading and writing. Whatever we know, think, or intuit about computation is built on our practical knowledge of arithmetic.

As far as I can tell, we think of arithmetic as being about numbers. Numbers are different from words. And they’re different from literary texts. And not merely different. Some of us – many of whom study literature professionally – have learned that numbers and literature are deeply and utterly different to the point of being fundamentally in opposition to one another. From that point of view the notion that literary texts be understood computationally is little short of blasphemy.

Not so. Not quite.

The question of just what numbers are – metaphysically, ontologically – is well beyond the scope of this post. But what they are in arithmetic, that’s simple; they’re symbols. Words too are symbols; and literary texts are constituted of words. In this sense, perhaps superficial, but nonetheless real, the reading of literary texts and making arithmetic calculations are the same thing, operations on symbols. Continue reading “Cognitivism and the Critic 2: Symbol Processing”

Phonology and Phonetics 101: Vowels pt 1

In phonetics and phonology there is an important distinction to be made between sounds that can be broadly categorised into two divisions: consonants and vowels. For this post, however, I will be focusing on the second, and considered by some to be the more problematic, division. So, what are vowels? For one, they probably aren’t just the vowels (a, e, i, o, u) you were taught in during school. This is one of the big problems when teaching the sounds systems of a language with such an entrenched writing system, as in English, especially when there is a big disconnect between the sounds you make in speech and the representation of sound in orthography. To give a simple example: how many different vowels are there in bat, bet, arm, and say? Well, if you were in school, then a typical answer would be two: a and e. In truth, from a phonological standpoint, there are four different vowels: [æ], [e], [ɑː], [eɪ]. The point that vowel-sounds are different from vowel-letters is an easy one to get across. The difficultly arises in actually providing a working definition. So, again, I ask:

What are vowels?

Continue reading “Phonology and Phonetics 101: Vowels pt 1”

Phonology and Phonetics 101

What I’m going to try and do in this series of posts is follow my phonology module at Cardiff. As such, these posts are essentially my notes on the topic, and may not always come across too clearly. First, I thought it would be useful to give a quick definition of both phonology and phonetics, before moving on to discuss the anatomical organisation of our vocal organs.

Phonetics and Phonology

To begin, phonetics, often referred to as the science of speech sound, is concerned with the physical production, acoustic transmission and perception of human speech sounds (see: phone). One key element of phonetics is the use of transcription to provide a one-to-one mapping between phones and written symbols (something I’ll come back to in a later post). In contrast, phonology focuses on the systematic use of sound in language to encode meaning. So, whereas phonetics is specifically concerned with human speech sounds, phonology, despite having a grounding in phonetics, links in with other levels of language through abstract sound systems and gestures. SIL provides a useful little diagram showing where phonetics and phonology lie in relation to other linguistic disciplines:

Continue reading “Phonology and Phonetics 101”

That’s Linguistics (Not logistics)


Linguists really need a catchy tune to match those in logistics. Any takers?

I always remember when one of my former lecturers said he was surprised by how little the average person will know about linguistics. For me, this was best exemplified when, upon enquiring about my degree, my friend paused for a brief moment and said: “Linguistics. That’s like logistics, right?” Indeed. Not really being in the mood to bash my friend’s ignorance into a bloody pulp of understanding, I decided to take a swig of my beer and simply replied: “No, not really. But it doesn’t matter.” Feeling guilty for not gathering the entire congregation of party-goers, sitting them down and proceeding to explain the fundamentals of linguistics, I have instead decided to write a series of 101 posts.

With that said, a good place to start is by providing some dictionary definitions highlighting the difference between linguistics and logistics:

Linguistics /lɪŋˈgwɪs.tɪks/ noun

the systematic study of the structure and development of language in general or of particular languages.

Logistics /ləˈdʒɪs.tɪks/ plural noun

the careful organization of a complicated activity so that it happens in a successful and effective way.

Arguably, linguistics is a logistical solution for successfully, and rigorously, studying language through the scientific method, but to avoid further confusion this is the last time you’ll see logistics in these posts. So, as you can probably infer, linguistics is a fairly broad term that, for all intensive purposes, simply means it’s a discipline for studying language. Those who partake in the study of language are known as linguists. This leads me to another point of contention: a linguist isn’t synonymous with a polyglot. Although there are plenty of linguists who do speak more than one language, many of them are quite content just sticking to their native language. It is, after all, possible for linguists to study many aspects of a language without necessarily having anything like native-level competency. In fact, other than occasionally shouting pourquoi when (drunkly) reflecting on my life choices, or ach-y-fi when a Brussels sprout somehow manages to make its way near my plate, I’m mainly monolingual.

Continue reading “That’s Linguistics (Not logistics)”