Free online Natural Language Processing course

January 11, 2012 in Uncategorized

Chris Manning and Dan Jurafsky are running a free online 8-week course on Natural Language Processing to students worldwide, January 23rd – March 18th 2012:

For those of you who know students or colleagues who might be looking for an introduction to NLP next quarter, encourage them to join us and the 40,000 students who have already registered in the course!

Students have access to screencast lecture videos, are given quiz questions, review exams and programming assignments in Java or Python, receive regular feedback on progress, and can participate in a discussion forum.

The course covers a broad range of topics in natural language processing at the advanced undergraduate or introductory graduate level, including word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, and question answering, We will also introduce the underlying theory from probability, statistics, and machine learning that are crucial for the field, and cover fundamental algorithms like n-gram language modeling, naive bayes and maxent classifiers, sequence models, probabilistic dependency and constituent parsing, and vector-space models of meaning.

You can find more information about joining at http://www.nlp-class.org/

The power of diversity: New Scientist recognises the growing work on social structure and linguistic structure

December 14, 2011 in Uncategorized

A feature article in last week’s New Scientist asks why there is so much linguistic diversity present in the world, and what are the forces that drive it.  The article reads like a who’s who of the growing field of language structure and social structure:  Mark Pagel, Gary Lupyan, Quentin Atkinson, Robert Munroe, Carol and Melvin Ember, Dan Dediu and Robert Ladd, Stephen Levinson (click on the names to see some Replicated Typo articles about their work).  This is practically as close as my subject will come to having a pull-out section in Vanity Fair.  Furthermore, it recognises the weakening grip of Chomskyan linguistics.

Commentators have already gotten hung-up on whether English became simplified before or after spreading, but this misses the impact of the article:  There is an alternative approach to linguistics which looks at the differences between languages and recognises social factors as the primary source of linguistic change.  Furthermore, these ideas are testable using statistics and genetic methods.  It’s a pity the article didn’t mention the possibility of experimental approaches, including Gareth Robert’s work on emerging linguistic diversity and work on cultural transmission using the Pictionary paradigm (Simon Garrod, Nick Fay, Bruno Gallantucci, see here and here).

David Robson (2011). Power of Babel: Why one language isn’t enough New Scientist, 2842 Online

Spurious correlation bonanza to mark Replicated Typo 2.0 reaching 100,000 hits

November 30, 2011 in Uncategorized

Replicated Typo 2.0 has reached 100,000 hits!  The most popular search term that leads visitors here is ‘What makes humans unique?’ and part of the answer has to be our ability to transmit our culture.  But as we’ve shown on this blog, culturally transmitted features can be highly correlated with each other.  This fact is a source of both frustration and fascination, so I’ve roped together some of my favourite investigations of cultural correlations into a correlation super-chain.  In addition, there’s a whole new spurious correlation at the end of the article!

Let Replicated Typo take you on trip from acacia trees to traffic accidents…

Read the rest of this entry →

Chomsky on Language Evolution

November 11, 2011 in Uncategorized

Noam Chomsky recently gave a lecture on the poverty of the stimulus at UCL  responding to topics such as language evolution and artificial language learning experiments. From about 89 minutes in he discusses iterated learning and language evolution, saying the conclusions derive from “serious illusions about evolution”:

Chomsky’s criticism of iterated learning experiments (see post here and here) is based on two points.  First, the emergence of structure is more to do with the intelligence of the modern humans taking part in the experiment than a realistic language evolving scenario.  He suggests that structure would not emerge in a series of computer programs without human intelligence.  As as a colleague pointed out, however, the first iterated learning experiments used computational models of this kind.  Secondly, he suggests that the view of evolution employed in the explanation of these systems is a pop-psychology, gradual hill-climbing one.  In fact, Chomsky claims, evolution of traits such as language or eyes derive from single, frozen accidents.  That is, evolution moves in leaps and bounds rather than small steps (Jim Hurford recently gave a lecture entitled ‘Reconciling linguistic jerks and biological creeps‘ on this topic).  Why else would humans be the only species with language?

Geoffrey Pullum counters this last point by asking why would an innately specified UG emerge so rapidly, but then freeze for tens of thousands of years, when (borrowing Phillip Lieberman’s point) traits such as lactose tolerance have emerged in the human genome within two thousand years.  Chomsky gives some examples of traits that have developed rapidly, but then only changed marginally.

I don’t think that proponents of iterated learning paradigms would have a problem with a sudden emergence of a capacity for advanced linguistic communication.  Although there is a continuity between human and non-human communication systems, we have some tricks that other animals don’t (see Michael’s post here).  However, the evolution of the structure of language after these mutations could owe a huge amount to processes of cultural transmission.  The universals we see in the world’s languages, then would be an amplification of weak biological biases.

However, Chomsky seems disillusioned with the whole field of what he calls ‘the evolution of communication’.  At least we didn’t get it as bad as exemplar theory, which he dismisses as “so outlandish it’s not worth thinking about”.

[Edit: I originally attributed Mark Liberman instead of Phillip Lieberman.  Now I've made this error in both directions!]

Great Andamanese: The key to more than one linguistic puzzle?

November 9, 2011 in Uncategorized

Last week we had a lecture from Anvita Abbi on rare linguistic structures in Great Andamanese – a language spoken in the Andaman Islands.  The indigenous populations of the Andaman Islands lived in isolation for tens of thousands of years until the 19th Century, but still exhibit some common features of south-east Asian languages such as retroflex consonants.  This could be evidence for the migration route of humans from India to Australia.  Indeed, recent genetic research has shown that the Andamanese are descendants of the first human migration from Africa in the Palaeolithic, though Abbi suggested that the linguistic evidence is also a strong marker of human migration and an “important repository of our shared human history and civilization”.

Although the similarities are fascinating for studies of cultural evolution, the rarity of some structures in Great Andamanese are even more intriguing.

The Andaman Islands

Read the rest of this entry →

James Hurford: Animals Do Not Have Syntax (Compositional Syntax, That Is)

October 30, 2011 in Evolution, Science, Uncategorized

After passing my final exams I feel that I can relax a bit and have the time to read a book again. So instead of reading a book that I need to read purely for ‘academic reasons’, I thought I’d pick one I’d thoroughly enjoy: James Hurford’s “The Origins of Grammar“, which clocks in at a whopping 808 pages.
I’m still reading the first chapter (which you can read for free here) but I thought I’d share some of his analyses of “Animal Syntax.”
Hurford’s general conclusion is that despite what you sometimes read in the popular press,

“No non-human has any semantically compositional syntax, where the form of the syntactic combination determines how the meanings of the parts combine to make the meaning of the whole.”

The crucial notion here is that of compositionality. Hurford argues that we can find animal calls and songs that are combinatorial, that is songs and calls in which elements are put together according to some kind of rule or pattern. But what we do not find, he argues, are the kinds of putting things together where the elements put together each have a specified meaning and the whole song, call or communicative assembly “means something which is a reflection of the meanings of the parts.”

(Link)
To illustrate this, Hurford cites the call system of putty-nosed monkeys (Arnold and Zuberbühler 2006). These monkeys have only two different call signals in their repertoire, a ‘pyow’-sound that ‘means’, roughly, ‘LEOPARD’; and a ‘hack’ sound that ‘means’, roughly, ‘EAGLE’.

Read the rest of this entry →

Empirical approaches to musical protolanguage theory

October 29, 2011 in Uncategorized

Keelin Murray talks about empirical approaches to musical protolanguage theory at this year’s Protolang 2 conference in Poland.

Videos of other talks are also available on the Protolang 2 website.

Cultural evolution of the individual

October 28, 2011 in Uncategorized

From Saturday morning breakfast cereal:


My thesis also looks like a lot of thought scribbles at the moment.

Why evolutionary linguists shouldn’t study languages

October 25, 2011 in Uncategorized

How many languages do you speak?  This is actually a difficult question, because there’s no such thing as a language, as I argue in this video.

This is a video of a talk I gave as part of the Edinburgh University Linguistics & English Language Society’s Soap Vox lecture series.  I argue that ‘languages’ are not discrete, monolithic, static entities – they are fuzzy, emergent, complex, dynamic, context-sensitive categories.  I don’t think anyone would actually disagree with this, yet some models of language change and evolution still include representations of a ‘language’ where the learner must ‘pick’ a language to speak, rather than picking variants and allowing higher-level categories like languages to emerge.

In this lecture I argue that languages shouldn’t be modelled as discrete, unchanging things by demonstrating that there’s no consistent, valid way of measuring the number of languages that a person speaks.

The slides aren’t always in view (it improves as the lecture goes on), but I’ll try and write this up as a series of posts soon.

The origins of word order

October 18, 2011 in Uncategorized

A paper by Gell-Mann & Ruhlen in PNAS this week conducts a phylogenetic analysis of word order in languages and concludes that SOV is the most likely ancestor language word order.  The main conclusions from the analysis are:

(i) The word order in the ancestral language was SOV.

(ii) Except for cases of diffusion, the direction of syntactic change, when it occurs, has been for the most part SOV > SVO and, beyond that, SVO > VSO/VOS with a subsequent reversion to SVO occurring occasionally. Reversion to SOV occurs only through diffusion.

(iii) Diffusion, although important, is not the dominant process in the evolution of word order.

(iv) The two extremely rare word orders (OVS and OSV) derive directly from SOV.

This analysis agrees with Luke Maurtis‘ work on function and Uniform Information Density (blogged about here).