models – Replicated Typo

Advances in Visual Methods for Linguistics (AVML2012)

Some peeps over the the University of York are organising a conference on the advances in visual methods for linguistics (AVML) to take place in September next year.

Some peeps over the the University of York are organising a conference on the advances in visual methods for linguistics (AVML) to take place in September next year. This might be of interest to evolutionary linguists who use things like phylogenetic trees, networks, visual simulations or other fancy dancy visual methods. The following is taken from their website:

Linguistics, like other scientific disciplines, is centrally reliant upon visual images for the elicitation, analysis and presentation of data. It is difficult to imagine how linguistics could have developed, and how it could be done today, without visual representations such as syntactic trees, psychoperceptual models, vocal tract diagrams, dialect maps, or spectrograms. Complex multidimensional data can be condensed into forms that can be easily and immediately grasped in a way that would be considerably more taxing, even impossible, through textual means. Transforming our numerical results into graphical formats, according to Cleveland (1993: 1), ‘provides a front line of attack, revealing intricate structure in data that cannot be absorbed in any other way. We discover unimagined effects, and we challenge imagined ones.’ Or, as Keith Johnson succinctly puts it, ‘Nothing beats a picture’ (2008: 6).

So embedded are the ways we visualize linguistic data and linguistic phenomena in our research and teaching that it is easy to overlook the design and function of these graphical techniques. Yet the availability of powerful freeware and shareware packages which can produce easily customized publication-quality images means that we can create visual enhancements to our research output more quickly and more cheaply than ever before. Crucially, it is very much easier now than at any time in the past to experiment with imaginative and innovative ideas in visual methods. The potential for the inclusion of enriched content (animations, films, colour illustrations, interactive figures, etc.) in the ever-increasing quantities of research literature, resource materials and new textbooks being published, especially online, is enormous. There is clearly a growing appetite among the academic community for the sharing of inventive graphical methods, to judge from the contributions made by researchers to the websites and blogs that have proliferated in recent years (e.g. Infosthetics, Information is Beautiful, Cool Infographics, BBC Dimensions, or Visual Complexity).

In spite of the ubiquity and indispensability of graphical methods in linguistics it does not appear that a conference dedicated to sharing techniques and best practices in this domain has taken place before. This is less surprising when one considers that relatively little has been published specifically on the subject (exceptions are Stewart (1976), and publications by the LInfoVisgroup). We think it is important that researchers from a broad spectrum of linguistic disciplines spend time discussing how their work can be done more efficiently, and how it can achieve greater impact, using the profusion of flexible and intuitive graphical tools at their disposal. It is also instructive to view advances in visual methods for linguistics from a historical perspective, to gain a greater sense of how linguistics has benefited from borrowed methodologies, and how in some cases the discipline has been at the forefront of developments in visual techniques.

The abstract submission deadline is the 9th January.

Tea Leaves and Lingua Francas: Why the future is not easy to predict

We all take comfort in our ability to project into the future. Be it through arbitrary patterns in Spring Pouchong tea leaves, or making statistical inferences about the likelihood that it will rain tomorrow, our accumulation of knowledge about the future is based on continued attempts of attaining certainty: that is, we wish to know what tomorrow will bring. Yet the difference between benignly staring at tea leaves and using computer models to predict tomorrow’s weather is fairly apparent: the former relies on a completely spurious relationship between tea leaves and events in the future, whereas the latter utilises our knowledge of weather patterns and then applies this to abstract from currently available data into the future. Put simply: if there are dense grey clouds in the sky, then it is likely we’ll get rain. Conversely, if tea-leaves arrange themselves into the shape of a middle finger, it doesn’t mean you are going to be continually dicked over for the rest of your life. Although, as I’ll attempt to make clear below, these are differences in degrees, rather than absolutes.

So, how are we going to get from tea-leaves to Lingua Francas? Well, the other evening I found myself watching Dr Nicholas Ostler give a talk on his new book, The Last Lingua Franca: English until the Return to Babel. For those of you who aren’t familiar with Ostler, he’s a relatively well-known linguist, having written several successful books popularising socio-historical linguistics, and first came to my attention through Razib Kahn’s detailed review of Empires of the Word. Indeed, on the basis of Razib’s post, I was not surprised by the depth of knowledge expounded during the talk. On this note alone I’m probably going to buy the book, as the work certainly filters into my own interests of historical contact between languages and the subsequent consequences. However, as you can probably infer from the previous paragraph, there were some elements I was slightly-less impressed with — and it is here where we get into the murky realms between tea-leaves and knowledge-based inferences. But first, here is a quick summary of what I took away from the talk:

Continue reading “Tea Leaves and Lingua Francas: Why the future is not easy to predict”

Why evolutionary linguists shouldn’t study languages

How many languages do you speak? This is actually a difficult question, because there’s no such thing as a language, as I argue in this video.

This is a video of a talk I gave as part of the Edinburgh University Linguistics & English Language Society’s Soap Vox lecture series. I argue that ‘languages’ are not discrete, monolithic, static entities – they are fuzzy, emergent, complex, dynamic, context-sensitive categories. I don’t think anyone would actually disagree with this, yet some models of language change and evolution still include representations of a ‘language’ where the learner must ‘pick’ a language to speak, rather than picking variants and allowing higher-level categories like languages to emerge.

In this lecture I argue that languages shouldn’t be modelled as discrete, unchanging things by demonstrating that there’s no consistent, valid way of measuring the number of languages that a person speaks.

The slides aren’t always in view (it improves as the lecture goes on), but I’ll try and write this up as a series of posts soon.

A spin glass model of cultural consensus

Does your social network determine your rational rationality? When trying to co-ordinate with a number of other people on a cultural feature, the locally rational thing to do is to go with the majority. However, in certain situations it might make sense to choose the minority feature. This means that learning multiple features might be rational in some situations, even if there is a pressure against redundancy. I’m interested in whether there are situations in which it is rational to be bilingual and whether bilingualism is stable over long periods of time. Previous models suggest that bilingualism is not stable (e.g. Castello et al. 2007), therefore an irrational strategy (at least not a primary strategy), but these were based on locally rational learners.

This week we had a lecture from Simon DeDeo on system-wide timescales in the behaviour of macaques. He talked about Spin Glasses and rationality, which got me thinking. A Spin Glass is a kind of magnetised material where the ‘spin’ or magnetism (plus or minus) of the molecules does not reach a consensus, but flips about chaotically. This happens when the structure of the material creates ‘frustrated’ triangles where a molecule is trying to co-ordinate with other molecules with opposing spins, making it difficult to resolve the tensions. Long chains of interconnected frustrated triangles can cause system-wide flips on the order of hours or days and are difficult to study both in models (Ising model) and in the real world.

Continue reading “A spin glass model of cultural consensus”

Cognitivism and the Critic 2: Symbol Processing

It has long been obvious to me that the so-called cognitive revolution is what happened when computation – both the idea and the digital technology – hit the human sciences. But I’ve seen little reflection of that in the literary cognitivism of the last decade and a half. And that, I fear, is a mistake.

Thus, when I set out to write a long programmatic essay, Literary Morphology: Nine Propositions in a Naturalist Theory of Form, I argued that we think of literary text as a computational form. I submitted the essay and found that both reviewers were puzzled about what I meant by computation. While publication was not conditioned on providing such satisfaction, I did make some efforts to satisfy them, though I’d be surprised if they were completely satisfied by those efforts.

That was a few years ago.

Ever since then I pondered the issue: how do I talk about computation to a literary audience? You see, some of my graduate training was in computational linguistics, so I find it natural to think about language processing as entailing computation. As literature is constituted by language it too must involve computation. But without some background in computational linguistics or artificial intelligence, I’m not sure the notion is much more than a buzzword that’s been trendy for the last few decades – and that’s an awful long time for being trendy.

I’ve already written one post specifically on this issue: Cognitivism for the Critic, in Four & a Parable, where I write abstracts of four texts which, taken together, give a good feel for the computational side of cognitive science. Here’s another crack at it, from a different angle: symbol processing.

Operations on Symbols

I take it that ordinary arithmetic is most people’s ‘default’ case for what computation is. Not only have we all learned it, it’s fundamental to our knowledge, like reading and writing. Whatever we know, think, or intuit about computation is built on our practical knowledge of arithmetic.

As far as I can tell, we think of arithmetic as being about numbers. Numbers are different from words. And they’re different from literary texts. And not merely different. Some of us – many of whom study literature professionally – have learned that numbers and literature are deeply and utterly different to the point of being fundamentally in opposition to one another. From that point of view the notion that literary texts be understood computationally is little short of blasphemy.

Not so. Not quite.

The question of just what numbers are – metaphysically, ontologically – is well beyond the scope of this post. But what they are in arithmetic, that’s simple; they’re symbols. Words too are symbols; and literary texts are constituted of words. In this sense, perhaps superficial, but nonetheless real, the reading of literary texts and making arithmetic calculations are the same thing, operations on symbols. Continue reading “Cognitivism and the Critic 2: Symbol Processing”

Statistics and Symbols in Mimicking the Mind

MIT recently held a symposium on the current status of AI, which apparently has seen precious little progress in recent decades. The discussion, it seems, ground down to a squabble over the prevalence of statistical techniques in AI and a call for a revival of work on the sorts of rule-governed models of symbolic processing that once dominated much of AI and its sibling, computational linguistics.

Briefly, from the early days in the 1950s up through the 1970s both disciplines used models built on carefully hand-crafted symbolic knowledge. The computational linguists built parsers and sentence generators and the AI folks modeled specific domains of knowledge (e.g. diagnosis in elected medical domains, naval ships, toy blocks). Initially these efforts worked like gang-busters. Not that they did much by Star Trek standards, but they actually did something and they did things never before done with computers. That’s exciting, and fun.

In time, alas, the excitement wore off and there was no more fun. Just systems that got too big and failed too often and they still didn’t do a whole heck of a lot.

Then, starting, I believe, in the 1980s, statistical models were developed that, yes, worked like gang-busters. And these models actually did practical tasks, like speech recognition and then machine translation. That was a blow to the symbolic methodology because these programs were “dumb.” They had no knowledge crafted into them, no rules of grammar, no semantics. Just routines the learned while gobbling up terabytes of example data. Thus, as Google’s Peter Norvig points out, machine translation is now dominated by statistical methods. No grammars and parsers carefully hand-crafted by linguists. No linguists needed.

What a bummer. For machine translation is THE prototype problem for computational linguistics. It’s the problem that set the field in motion and has been a constant arena for research and practical development. That’s where much of the handcrafted art was first tried, tested, and, in a measure, proved. For it to now be dominated by statistics . . . bummer.

So that’s where we are. And that’s what the symposium was chewing over.

Continue reading “Statistics and Symbols in Mimicking the Mind”

Animal Signalling Theory 101: Handicap, Index… or even a signal? The Case of Fluctuating Asymmetry

The differences between handicaps and indices are usually distinguishable in formal mathematical models or in unambiguous real-world cases. Often though, classifying a trait as a handicap, an index, or even a signal at all, can be quite a difficult task.

For the purposes of illustration I will use Fluctuating Asymmetry (FA for short) as an example. Fluctuating asymmetry is the term used to refer to deviation from symmetry in paired morphological structures (ranging from birds’ tails to human faces) that should be, all being well, bilaterally symmetric. Deviations from the ideal symmetrical phenotype are caused by inherent genetic perturbations and exposure to environmental disturbances occurring in early development.

Is FA a signal?

In their 2005 book Animal Signals, Maynard-Smith and Harper define a signal as:

‘Any act or structure which alters the behaviour of other organisms, which evolved because of that effect, and which is effective because the receiver’s response has also evolved’

They then argue that FA is unlikely to function as a signal because it is difficult to discern whether receivers respond directly to FA and because there appear to be few examples of displays in which signallers actively advertise their symmetry to receivers.

Continue reading “Animal Signalling Theory 101: Handicap, Index… or even a signal? The Case of Fluctuating Asymmetry”

Dialects in Tweets

A recent study published in the proceedings of the Empirical Methods in Natural Language Processing Conference (EMNLP) in October and presented in the LSA conference last week found evidence of geographical lexical variation in Twitter posts. (For news stories on it, see here and here.) Eisenstein, O’Connor, Smith and Xing took a batch of Twitter posts from a corpus released of 15% of all posts during a week in March. In total, they kept 4.7 million tokens from 380,000 messages by 9,500 users, all geotagged from within the continental US. They cut out messages from over-active users, taking only messages from users with less than a thousand followers and followees (However, the average author published around 40~ posts per day, which might be seen by some as excessive. They also only took messages from iPhones and BlackBerries, which have the geotagging function. Eventually, they ended up with just over 5,000 words, of which a quarter did not appear in the spell-checking lexicon aspell.

In order to figure out lexical variation accurately, both topic and geographical regions had to be ascertained. To do this, they used a generative model (seen above) that jointly figured these in. Generative models work on the assumption that text is the output of a stochastic process that can be analysed statistically. By looking at mass amounts of texts, they were able to infer the topics that are being talked about. Basically, I could be thinking of a few topics – dinner, food, eating out. If I am in SF, it is likely that I may end up using the word taco in my tweet, based on those topics. What the model does is take those topics and figure from them which words are chosen, while at the same time figuring in the spatial region of the author. This way, lexical variation is easier to place accurately, whereas before discourse topic would have significantly skewed the results (the median error drops from 650 to 500 km, which isn’t that bad, all in all.)

The way it works (in summary and quoting the slide show presented at the LSA annual meeting, since I’m not entirely sure on the details) is that, in order to add a topic, several things must be done. For each author, the model a) picks a region from P( r | ∂ ) b) picks a location from P( y | lambda, v ) and c) picks a distribution over P( Theta | alpha ). For each token, it must a) pick a topic from P( z | Theta ), and then b) pick a word from P( w | nu ). Or something like that (sorry). For more, feel free to download the paper on Eisenstien’s website.

Well, what did they find? Basically, Twitter posts do show massive variation based on region. There are geographically-specific proper names, of course, and topics of local prominence, like taco in LA and cab in NY. There’s also variation in foreign language words, with pues in LA but papi in SF. More interestingly, however, there is a major difference in regional slang. ‘uu’, for instance, is pretty much exclusively on the Eastern seaboard, while ‘you’ is stretched across the nation (with ‘yu’ being only slightly smaller.) ‘suttin’ for something is used only in NY, as is ‘deadass’ (meaning very) and, on and even smaller scale, ‘odee’, while ‘af’ is used for very in the Southwest, and ‘hella’ is used in most of the Western states.

Screen shot 2011-01-12 at 23.41.24 — Dialectical variation for 'very'

More importantly, though, the study shows that we can separate geographical and topical variation, as well as discover geographical variation from text instead of relying solely on geotagging, using this model. Future work from the authors is hoped to cover differences between spoken variation and variation in digital media. And I, for one, think that’s #deadass cool.

Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, & Eric P. Xing (2010). A Latent Variable Model for Geographic Lexical Variation. Proceedings of EMNLP

Top-down vs bottom-up approaches to cognition: Griffiths vs McClelland

There is a battle about to commence. A battle in the world of cognitive modelling. Or at least a bit of a skirmish. Two articles to be published in Trends in Cognitive Sciences debate the merits of approaching cognition from different ends of the microscope.

On the side of probabilistic modelling we have Thom Griffiths, Nick Chater, Charles Kemp, Amy Perfors and Joshua Tenenbaum. Representing (perhaps non-symbolically) emergentist approaches are James McClelland, Matthew Botvinick, David Noelle, David Plaut, Timothy Rogers, Mark Seidenberg and Linda B. Smith. This contest is not short of heavyweights.

However, the first battleground seems to be who can come up with the most complicated diagram. I leave this decision to the reader (see first two images).

The central issue is which approach is the most productive for explaining phenomena in cognition. David Marr’s levels of explanation include the ‘computational’ characterisation of the problem, an ‘algorithmic’ description of the problem and an ‘implementational’ explanation which focusses on how the task is actually implemented by real brains. Structured probabilistic takes a ‘top-down’ approach while Emergentism takes a ‘bottom-up’ approach.

Continue reading “Top-down vs bottom-up approaches to cognition: Griffiths vs McClelland”

From Natyural to Nacheruhl: Utterance Selection and Language Change

Most of us should know by now that language changes. It’s why the 14th Century prose of Geoffrey Chaucer is nearly impenetrable to modern day speakers of English. It is also why Benjamin Franklin’s phonetically transcribed pronunciation of the English word natural sounded like natyural (phonetically [nætjuɹəl]) rather than our modern variant with a ch sound (phonetically [nætʃəɹəl]). However, it is often taken for granted on this blog that language change can be understood as an evolutionary process. Many people might not see the utility of such thinking outside the realm of biology. That is, evolutionary theory is strictly the preserve of describing biological change, and is less useful as a generalisable concept. A relatively recent group of papers, however, have taken the conceptual machinery of evolutionary theory (see Hull, 2001) and applied it to language.

Broadly speaking, these utterance selection models highlight that language change occurs across two steps, each corresponding to an evolutionary process: (1) the production of an utterance, and (2) the propagation of linguistic variants within a speech community. The first of these, the production of an utterance, takes place across an extremely short timescale: we will replicate particular sounds, words, and constructions millions of times across our production lifetime. It is as this step where variation is generated: phonetic variation, for instance, is not only generated through different speakers having different phonetic values for a single phoneme — the same speaker will produce different phonetic values for a single phoneme based on the context. Through variation comes the possibility of selection within a speech community. This leads us to our second timescale, which sees the selection and propagation of these variants — a process that may “take many generations of the replication of the word, which may–or may not–extend beyond the lifetime of an individual speaker.” (Croft, in press).

Recent mathematical work in this area has highlighted four selection mechanisms: replicator selection, neutral evolution, neutral interactor selection, and weighted interactor selection. I’ll now provide a brief overview of each of these mechanisms in relation to language change.

Continue reading “From Natyural to Nacheruhl: Utterance Selection and Language Change”