Corpus Linguistics, Literary Studies, and Description

One of my main hobbyhorses these days is description. Literary studies has to get a lot more sophisticated about description, which is mostly taken for granted and so is not done very rigorously. There isn’t even a sense that there’s something there to be rigorous about. Perhaps corpus linguistics is a way to open up that conversation.
The crucial insight is this: What makes a statement descriptive IS NOT how one arrives at it, but the role it plays in the larger intellectual enterprise.

A Little Background Music

Back in the 1950s there was this notion that the process of aesthetic criticism took the form of a pipeline that started with description, moved on to analysis, then interpretation and finally evaluation. Academic literary practice simply dropped evaluation altogether and concentrated its efforts on interpretation. There were attempts to side-step the difficulties of interpretation by asserting that one is simply describing what’s there. To this Stanley Fish has replied (“What Makes an Interpretation Acceptable?” in Is There a Text in This Class?, Harvard 1980, p. 353):

 

The basic gesture then, is to disavow interpretation in favor of simply presenting the text: but it actually is a gesture in which one set of interpretive principles is replaced by another that happens to claim for itself the virtue of not being an interpretation at all.

 

And that takes care of that.
Except that it doesn’t. Fish is correct in asserting that there’s no such thing as a theory-free description. Literary texts are rich and complicated objects. When the critic picks this or that feature for discussion those choices are done with something in mind. They aren’t innocent.
But, as Michael Bérubé has pointed out in “There is Nothing Inside the Text, or, Why No One’s Heard of Wolfgang Iser” (in Gary Olson and Lynn Worsham, eds. Postmodern Sophistries, SUNY Press 2004, pp. 11-26) there is interpretation and there is interpretation and they’re not alike. The process by which the mind’s eye makes out letters and punctuation marks from ink smudges is interpretive, for example, but it’s rather different from throwing Marx and Freud at a text and coming up with meaning.
Thus I take it that the existence of some kind of interpretive component to any description need not imply that the necessity of interpretation implies that it is impossible to descriptively carve literary texts at their joints. And that’s one of the things that I want from description, to carve texts at their joints.
Of course, one has to know how to do that. And THAT, it would seem, is far from obvious.

Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution

In scientific prognostication we have a condition analogous to a fact of archery—the farther back you draw your longbow, the farther ahead you can shoot.
– Buckminster Fuller

The following remarks are rather speculative in nature, as many of my remarks tend to be. I’m sketching large conclusions on the basis of only a few anecdotes. But those conclusions aren’t really conclusions at all, not in the sense that they are based on arguments presented prior to them. I’ve been thinking about cultural evolution for years, and about the need to apply sophisticated statistical techniques to large bodies of text—really, all the texts we can get, in all languages—by way of investigating cultural evolution.

So it is no surprise that this post arrives at cultural evolution and concludes with remarks on how the human sciences will have to change their institutional ways to support that kind of research. Conceptually, I was there years ago. But now we have a younger generation of scholars who are going down this path, and it is by no means obvious that the profession is ready to support them. Sure, funding is there for “digital humanities” and so deans and department chairs can get funding and score points for successful hires. But you can’t build a profound and a new intellectual enterprise on financially-driven institutional gamesmanship alone.

You need a vision, and though I’d like to be proved wrong, I don’t see that vision, certainly not on the web. That’s why I’m writing this post. Consider it sequel to an article I published back in 1976 with my teacher and mentor, David Hays: Computational Linguistics and the Humanist. This post presupposes the conceptual framework of that vision, but does not restate nor endorse its specific recommendations (given in the form of a hypothetical program for simulating the “reading” of texts).

The world has changed since then and in ways neither Hays nor I anticipated. This post reflects those changes and takes as its starting point a recent web discussion about recovering the history of literary studies by using the largely statistical techniques of corpus linguistics in a kind of digital archaeology. But like Tristram Shandy, I approach that starting point indirectly, by way of a digression.

Who’s Kemp Malone?

Back in the ancient days when I was still an undergraduate, and we tied an onion in our belts as was the style at the time, I was at an English Department function at Johns Hopkins and someone pointed to an old man and said, in hushed tones, “that’s Kemp Malone.” Who is Kemp Malone, I thought? From his Wikipedia bio:

Born in an academic family, Kemp Malone graduated from Emory College as it then was in 1907, with the ambition of mastering all the languages that impinged upon the development of Middle English. He spent several years in Germany, Denmark and Iceland. When World War I broke out he served two years in the United States Army and was discharged with the rank of Captain.

Malone served as President of the Modern Language Association, and other philological associations … and was etymology editor of the American College Dictionary, 1947.

Who’d have thought the Modern Language Association was a philological association? Continue reading “Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution”

PLM2012 Coverage: Dirk Geeraerts: Corpus Evidence for Non-Modularity

The first plenary talk at this year’s Poznań Linguistic Meeting was by Dirk Geeraerts, who is professor of linguistics at the University of Leuven, Belgium.

In his talk, he discussed the possibility that corpus studies could yield evidence against the supposed modularity of language and mind endorsed by, for example, Generative linguists (you can find the abstract here)

Geeraerts began his talk by stating that there seems to be a paradigm shift in linguistics from an analysis of structure that is based on introspection to analyses of behaviour based on quantitative linguistic studies. More and more researchers are adopting quantified corpus-based analyses, which test hypotheses using statistical testing of language behaviour. As a data-set they use experimental data or large corpora. In his talk, he discussed the possibility that corpus studies could yield evidence against the supposed modularity of language and mind endorsed by, for example, Generative linguists (you can find the abstract here)

Multifactoriality

One further trend Geeraerts identified in this paradigm shift is that these kinds of analyses become more and more multifactorial in that they include multiple different factors which are both internal and external to language. Importantly, this way of doing linguistics is fundamentally different than the mainstream late 20th century view of linguistics.

What is important to note here when comparing this trend to other approaches to studying language is that multifactoriality goes against Chomsky’s idea of grammar as an ideal mental system that can be studied through introspection. In the traditional view, it is supposed that there is some kind of ideal language system which everyone has access to. This line of reasoning then justifies introspection as a method of studying the whole system of language and making valid generalizations about it. However, this goes against the emerging corpus linguistic view of language. On this view a random speaker is not representative for the linguistic community as a whole. The linguistic system is not homogenous across all speakers, and therefore introspection doesn’t suffice.

Modularity

The main thrust of Geeraerts’ talk was that research within this emerging paradigm also might call into question the assumption of the modularity of the mind (as advocated, for example by Jerry Fodor or Neil Smith): The view of the mind as a compartmentalized system consisting of discrete components or modules (for example, the visual system, language) plus a central processor.

Continue reading “PLM2012 Coverage: Dirk Geeraerts: Corpus Evidence for Non-Modularity”