Niche as a determinant of word fate in online groups (featuring @hanachronism and @richlitt)

Last year Altmann, Pierrehumbert & Motter (henceforth, APM) released a great paper in PLoS One: Niche as a determinant of word fate in online groups. Having referenced the paper extensively in my non-bloggy academic world, I thought it was about time I mentioned it on a Replicated Typo. Below is the abstract:

Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between their intrinsic properties and the environments in which they function. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.

Continue reading “Niche as a determinant of word fate in online groups (featuring @hanachronism and @richlitt)”

Evolution of the Speech Code: Higher-Order Symbolism and the Linguistic Big Bang

Two months ago Daniel Silverman (San Jose State University) gave a talk at the LEC on the Evolution of the Speech Code: Higher-Order Symbolism and the Linguistic Big Bang. With his permission, I’ve posted below a PDF of a paper he’s written based on the talk — it’s really fascinating stuff and chock-a-block with ideas. Keep in mind that it’s a work in progress, but I’m sure he’ll appreciate any (informative) comments. So, on that note, go and read:

[gview file=”” save=”1″]

Higgs Boson and Big Data

It’s not about cultural evolution, but I think most people who have even a passing interest in science are gearing up to welcome Higgs Boson to the elementary particle party. Anyway, here’s a nicely put together video on explaining what the Higgs Boson is and why its discovery is significant:

The Higgs Boson Explained from PHD Comics on Vimeo.

There’s also a more general point about needing to gather a huge amount of data (15 petabytes a year — enough to fill more than 1.7 million dual-layer DVDs a year) to find the very small effect size that is predicted for the Higgs Boson. In itself, data of this magnitude will likely come with significantly more noise, which means physicists have needed to develop well-defined statistical methods (they even have their own statistics committee). It really is a massive achievement for modern science.


From Grooming to Speaking: Recent trends in social primatology and human ethology (Conference Announcement)

Should be of interest to some readers:

The Centre for Philosophy of Science of the Faculty of Science of the Portuguese University of Lisbon is organizing a 3-day international colloquium entitled “From Grooming to Speaking: recent trends in social primatology and human ethology”, on September 10-12th, 2012.

Conference website

Continue reading “From Grooming to Speaking: Recent trends in social primatology and human ethology (Conference Announcement)”

Nothing in Language Makes Sense…

… Except in the Light of Biological and Cultural Evolution

Sean mentioned in one of his many Evolang posts that, based on de Boer’s talk, the real audience for researchers of cultural evolution should be biologists. Well, deciding that actions plus words can work far better together, I decided to get in contact with Jeremy Yoder of the excellent group blog, Nothing in Biology Makes Sense. The result: an introductory post on the biological and cultural evolution of language called Crossing Those Curious Parallels (after Darwin’s famous passage describing the similarities between linguistic and biological change). Most regular readers will be familiar with the content and argument as the article is a pastiche of earlier pieces I wrote on this blog, but there is a sprinkling of some original paragraphs here and there. So feel free to go over, leave a comment and help foster some cross-disciplinary discussions. Actually, on cross-disciplinary note: since physicists seem so keen to solve problems in linguistics, maybe we should lend them a hand and run a corpus analysis to discover that elusive mass of the Higgs boson.



So, what is it then, this Grammaticalization?

A century ago Antoine Meillet, in his work L’évolution des Formes Grammaticales, coined the term grammaticalization to describe the process through which linguistic forms evolve from a lexical to a grammatical status. Even though knowledge of this process is found in earlier works by French and British philosophers (e.g. Condillac, 1746; Tooke, 1857), as well as in the publications of a long list of nineteenth-century linguists beginning with Franz Bopp (1816) (cf. Heine, 2003), it was Meillet’s term that would come to characterise what is now a whole field of study in historical language change. At a first glance, the concept of grammaticalization might seem fairly straightforward, yet in the proceeding hundred years it has undergone numerous revisions and developments, with many of these issues being brought to the fore at a conference I recently attended in Berlin (yes, there are other conferences we’re interested in than Evolang).

One of the stated aims of the conference was to refine the notion of grammaticalization (click here for the website). I’m not 100% sure this was achieved and, following an excellent talk by Graeme Trousdale, I was even less sure of whether we should keep using the term. We’ll come back to this is in a moment. For now, many linguists will probably agree that one of the most prominent developments is found in the expansion of Meillet’s definition by Kuryłowicz (1965): “[…] grammaticalization is that subset of linguistic changes whereby lexical material in highly constrained pragmatic and morphosyntactic contexts becomes grammatical, and grammatical material becomes more grammatical […]” (Traugott, 1996: 183 [my emphasis]). Under this new definition, grammaticalization takes into account the gradual nature of diachronic change in language, with there being a continuum of various degrees of grammatical status (Winter-Froemel, 2012).

A widely used example of grammaticalization is the development of the periphrastic future be going to. In the time of Shakespeare’s English, be going to had its literal meaning of a subject travelling to a location in order to do something, with the subject position only allowing for a noun phrase denoting “an animate, mobile entity, and the verb following the phrase would have to be a dynamic verb” (Bybee, 2003: 605). Indeed, there were several movement verbs that we could substitute based on the following constructional schema:

(1)        [[movement verb + Progressive] + purpose clause (to + infinitive)]

            E.g.,     I am going to see the king

                        I am travelling to see the king

                        I am riding to see the king

However, of the above examples, it was only the construction with go in it that underwent grammaticalization so that the motion verb (go) and the purpose clause (to + infinitive) came to express intentionality and future possibility. Of course, these changes did not happen abruptly, but rather they gradually evolved over time, with one prediction being that there was stage of ambiguity where both meanings coexisted (see Hopper’s concept of layering). We might conceive of this as hidden variation due to the inferential capacities entailed in the transmission from speakers to hearers. At some point the use of be going to was used in a construction that has an unambiguous meaning (e.g., I’m going to stay at home; The tree is going to lose its leaves etc), which led to an unmasking of this hidden variation within the speech community. This unmasking further opens up the possibility for these two meanings to become structurally untangled; demonstrated in contracted form of be plus the reduced gonna [gʌnə]. Below is a diagrammatic representation of these changes:

Continue reading “So, what is it then, this Grammaticalization?”

Phonemic Diversity and Vanishing Phonemes: Looking for Alternative Hypotheses

In my last post on the vanishing phonemes debate I briefly mentioned Atkinson’s two major theoretical points: (i) that there is a link between phoneme inventory sizes, mechanisms of cultural transmission and the underlying demographic processes supporting these changes; (ii) we could develop a Serial Founder Effect (SFE) model from Africa based on the phoneme inventory size. I also made the point that more work was needed on the details of the first claim before we went ahead and tested the second. To me at least, it seems slightly odd to assume the first model is correct, without really going to any great lengths to disprove it, and then go ahead and commit the statistical version of the narrative fallacy – you find a model that fits the past and use it to tell a story. Still, I guess the best way to get in the New York Times is to come up with a Human Origins story, and leave the boring phonemes as a periphery detail.

Unrealistic Assumptions?

One problem with these unrealistic assumptions is they lead us to believe there is a linear relationship between a linguistic variable (e.g. phoneme inventories) and a socio-demographic variable (e.g. population size). The reality is far more complicated. For instance, Atkinson (2011) and Lupyan & Dale (2010) both consider population size as a variable. Where the two differ is in their theoretical rationale for the application of this variable: whereas the former is interested in how these population dynamics impact upon the distribution of variation, the latter sees social structure as a substantial feature of the linguistic environment in which a language adapts. It is sometimes useful to tell simple stories, and abstract away from the messy complexities of life, yet for the relationships being discussed here, I think we’re still missing some vital points.

Continue reading “Phonemic Diversity and Vanishing Phonemes: Looking for Alternative Hypotheses”

A Cautionary Tale: Linguists are a powerful force of change (for phoneme inventories at least)

I’ve been reading through an earlier draft of my dissertation and noticed a few paragraphs that were omitted due to word length. Despite not making the final cut, it serves as nice reminder about where our data is coming from: that is, when we dive into WALS or UPSID, take a particular inventory and look at one of its phonemes, then we’re viewing something that’s been ascribed by the investigators/observers of said language. Anyway, it’s basically about the Wichí language — a member of the Matacoan language familyspoken in parts of South America’s Chaco region — and the various reports on its phoneme inventory size. N.B. The source is a PhD thesis by Megan Avram (2008).

Even if we accept the theoretical justification for the concept of a phoneme, then there is still an additional problem of how these representations are measured and recorded. These problems are neatly highlighted in the debates surrounding the Wichí language and its phoneme inventory. For instance, back in 1981 Antonio Tovar published an article showing the Wichí had 22 consonants, whereas if you were to jump forward 13 years to 1994, then Kenneth Claesson’s paper would tell you that they are down to just 16 consonants. This is quite a big difference. In WALS terms, Wichí has gone from having an average consonant inventory to a moderately small one. Great news then for those of you searching for a correlation between small communities (Wichí has approximately 25,000 speakers) and phoneme inventory inventory size. Not so great on the reliability front.

Short of conspiracy to bring the number of phonemes down (but see here), reasons for these differences are broad and varied. Some instances could be genuine differences between speech communities in the form of dialectal variation. Other reasons are more likely to be theoretically motivated. Take, as one of many examples, Claesson’s choice to omit glottalized consonants from his description of Wichí. His rationale being that these “are actually consonant clusters of a stop followed by a glottal stop” (Avram, 2008: 37-38). In summary, both sources of data are at the whims of subjectivity: for each language, or dialect, the study is reliant on the choices of potentially one researcher, at a very specific point in time, and with only a finite amount of resources (for a similar discussion, see the comments on Everett and recursion).

It’s straight out of phoneme inventories 101, but from time to time these little examples are useful as cautionary tales about the sources of data we often take for granted.

The Forgotten Linguist: Mikołaj Kruszewski

In the process of writing the first in a series of posts on the theoretical plausibility of the vanishing phonemes debate, I’ve found myself drawn into reading Daniel Silverman‘s excellent two-part article (part one and part two) on Mikołaj Kruszewski (1851-1887). You might call him one of the many forgotten linguists who, along with other notable absentees in the linguistic hall of fame, such as Erwin Esper, could have been highly influential had their ideas reached a wider audience. Although it is difficult to assess his impact on the historical development of linguistics, Kruszewksi theoretical insights certainly prefigured a lot of later work, especially regarding listener-based exemplar modelling and probability matching, as evident in this quote:

In the course of time, the sounds of a language depend on the gradual change of its articulation. We can pronounce a sound only when our memory retains an imprint of its articulation for us. If all our articulations of a given sound were reflected in this imprint in equal measure, and if the imprint represented an average of all these articulations, we, with this guidance, would always perform the articulation in question approximately the same way. But the most recent (in time) articulations, together with their fortuitous deviations, are retained by the memory far more forcefully than the earlier ones. Thus, neglibible deviations acquire the capacity to grow progressively greater…

Silverman goes on to mention some of Kruszewski’s other major insights, such as: (1) the arbitrary relationship between sound and meaning, (2) the non-teleological nature of the linguistic system, (3) the generative or creative character of language, (4) the connectionist organisation of the lexicon, and (5) the optimality-theoretic-esque proposal that linguistic systems may be analysed as the product of pressures and constraints in inherent conflict with one another. There is also a brief mention of Darwin’s influence on Kruszewski’s work (as we can see in his non-teleological stance).

The story ends on somewhat of a sad note, with Kruszweski suffering from a debilitating neurological and mental illness that cut short his promising career at the age of 36 — making his depth of scholarship and theoretical insight all the more impressive given it was produced in just eight years.

Anyway, you should take a look at the two articles, if only for an historical perspective on linguistics, but I would also suggest having a gander at some of Silverman’s other papers. He’s got his own ideas and insights that are worth considering (if you can wait, I’ll be discussing some of these in one of my next posts).

Everett, Pirahã and Recursion: The Latest

Discussing the concept of recursion is like a rite of passage for anyone interested in language evolution: you go through it once, take a position and hope it doesn’t come back to haunt you.  As Hannah pointed out last year, there are two definitions of recursion:

(1) embeddedness of phrases within other phrases, which entails keeping track of long-distance dependencies among phrases;

(2) the specification of the computed output string itself, including meta-recursion, where recursion is both the recipe for an utterance and the overarching process that creates and executes the recipe.

The case of grammatical recursion (see definition 1) is perhaps most famously associated with Noam Chomsky. Not only does he claim all human languages are recursive, but also that this ability is biologically hardwired as part of our genetic makeup. Countering Chomsky’s first claim is the debate surrounding a small Amazonian tribe called the Pirahã: even though they show signs of recursion, such as the ability to recursively embed structures within stories, the Pirahã grammar is claimed not to recursively embed phrases within other phrases. If true, then are numerous implications for a wide variety of fields in linguistics, but this is still an unsubstantiated claim: for the most part, we are relying on one specific researcher (Daniel Everett) who, despite having dedicated a large portion of his life to studying the tribe, could very well have been misled. That said, I retain a large amount of respect for Everett, having watched him speak at Edinburgh a few years ago and read his book on the topic: Don’t Sleep, There are Snakes: Life and Language in the Amazonian Jungle.

So, why am I rambling on about recursion? Well, besides its obvious relevance, — and perhaps under-representation on this blog (deserved or not, I’ll let you decide) — Everrett has recently published a series of slides about a corpus study of Pirahã grammar (see below).

[gview file=””]

His tentative conclusion: there is no strong evidence for recursion among relative clauses, complement clauses, possessive structures and conjunctions/disjunctionsHowever, there is possible evidence of recursive structure in topics/repeated arguments. He also posits cultural pressures for longer or shorter sentences, such as writing systems (as I mentioned way back in 2009).

I’m sure this debate will be brought to the fore at this year’s EvoLang, with Chomsky Berwick Piattelli-Palmarini and many of the Biolinguistic crowd in attendance, and it’s a shame I’ll almost certainly miss it (unless someone wants to pay for my ticket… Just hit the donate button in the left-hand corner 😉 ).

Continue reading “Everett, Pirahã and Recursion: The Latest”