Corpus Linguistics, Literary Studies, and Description

One of my main hobbyhorses these days is description. Literary studies has to get a lot more sophisticated about description, which is mostly taken for granted and so is not done very rigorously. There isn’t even a sense that there’s something there to be rigorous about. Perhaps corpus linguistics is a way to open up that conversation.
The crucial insight is this: What makes a statement descriptive IS NOT how one arrives at it, but the role it plays in the larger intellectual enterprise.

A Little Background Music

Back in the 1950s there was this notion that the process of aesthetic criticism took the form of a pipeline that started with description, moved on to analysis, then interpretation and finally evaluation. Academic literary practice simply dropped evaluation altogether and concentrated its efforts on interpretation. There were attempts to side-step the difficulties of interpretation by asserting that one is simply describing what’s there. To this Stanley Fish has replied (“What Makes an Interpretation Acceptable?” in Is There a Text in This Class?, Harvard 1980, p. 353):


The basic gesture then, is to disavow interpretation in favor of simply presenting the text: but it actually is a gesture in which one set of interpretive principles is replaced by another that happens to claim for itself the virtue of not being an interpretation at all.


And that takes care of that.
Except that it doesn’t. Fish is correct in asserting that there’s no such thing as a theory-free description. Literary texts are rich and complicated objects. When the critic picks this or that feature for discussion those choices are done with something in mind. They aren’t innocent.
But, as Michael Bérubé has pointed out in “There is Nothing Inside the Text, or, Why No One’s Heard of Wolfgang Iser” (in Gary Olson and Lynn Worsham, eds. Postmodern Sophistries, SUNY Press 2004, pp. 11-26) there is interpretation and there is interpretation and they’re not alike. The process by which the mind’s eye makes out letters and punctuation marks from ink smudges is interpretive, for example, but it’s rather different from throwing Marx and Freud at a text and coming up with meaning.
Thus I take it that the existence of some kind of interpretive component to any description need not imply that the necessity of interpretation implies that it is impossible to descriptively carve literary texts at their joints. And that’s one of the things that I want from description, to carve texts at their joints.
Of course, one has to know how to do that. And THAT, it would seem, is far from obvious.

Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution

In scientific prognostication we have a condition analogous to a fact of archery—the farther back you draw your longbow, the farther ahead you can shoot.
– Buckminster Fuller

The following remarks are rather speculative in nature, as many of my remarks tend to be. I’m sketching large conclusions on the basis of only a few anecdotes. But those conclusions aren’t really conclusions at all, not in the sense that they are based on arguments presented prior to them. I’ve been thinking about cultural evolution for years, and about the need to apply sophisticated statistical techniques to large bodies of text—really, all the texts we can get, in all languages—by way of investigating cultural evolution.

So it is no surprise that this post arrives at cultural evolution and concludes with remarks on how the human sciences will have to change their institutional ways to support that kind of research. Conceptually, I was there years ago. But now we have a younger generation of scholars who are going down this path, and it is by no means obvious that the profession is ready to support them. Sure, funding is there for “digital humanities” and so deans and department chairs can get funding and score points for successful hires. But you can’t build a profound and a new intellectual enterprise on financially-driven institutional gamesmanship alone.

You need a vision, and though I’d like to be proved wrong, I don’t see that vision, certainly not on the web. That’s why I’m writing this post. Consider it sequel to an article I published back in 1976 with my teacher and mentor, David Hays: Computational Linguistics and the Humanist. This post presupposes the conceptual framework of that vision, but does not restate nor endorse its specific recommendations (given in the form of a hypothetical program for simulating the “reading” of texts).

The world has changed since then and in ways neither Hays nor I anticipated. This post reflects those changes and takes as its starting point a recent web discussion about recovering the history of literary studies by using the largely statistical techniques of corpus linguistics in a kind of digital archaeology. But like Tristram Shandy, I approach that starting point indirectly, by way of a digression.

Who’s Kemp Malone?

Back in the ancient days when I was still an undergraduate, and we tied an onion in our belts as was the style at the time, I was at an English Department function at Johns Hopkins and someone pointed to an old man and said, in hushed tones, “that’s Kemp Malone.” Who is Kemp Malone, I thought? From his Wikipedia bio:

Born in an academic family, Kemp Malone graduated from Emory College as it then was in 1907, with the ambition of mastering all the languages that impinged upon the development of Middle English. He spent several years in Germany, Denmark and Iceland. When World War I broke out he served two years in the United States Army and was discharged with the rank of Captain.

Malone served as President of the Modern Language Association, and other philological associations … and was etymology editor of the American College Dictionary, 1947.

Who’d have thought the Modern Language Association was a philological association? Continue reading “Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution”

Digital Humanities Sandbox Goes to the Congo

Or, Speculations in Computational Evolutionary Psychology

Note: This version of the post has been revised from an earlier version in which I suggested that the distribution in the first chart followed a power law. Cosma Shalizi checked it for me and it’s not a power law distribution. It’s an exponential distribution.

So, I’ve been exploring Conrad’s Heart of Darkness. In the last two posts I’ve examined one paragraph in the text, the so-called nexus. It’s the longest paragraph in the text, it’s structurally central, and it covers a lot of semantic territory.

OK, but what about the other paragraphs.

What about them?

Aren’t you going to look at them?

Well, yeah, but I sure don’t have time to troll through them like I did the nexus. I mean, that post stretched from here to Sunday.

I get your point. Why don’t you do the Moretti thing?

Moretti thing?

You know, distant reading.

Distant reading? You mean count something? Count what?

How about paragraph length?

What’ll that get me?

I don’t know. Just do it. I mean, you already know that the nexus is the longest paragraph in the text. There must be something going on with that. Mess around and see if something turns up.

* * * * *
I did and it did.

I used the MSWord word-count tool to count the words in every paragraph in the text. All 198 of them. One at a time. Real tedious stuff. Then I loaded the results into a spreadsheet and created a bar chart showing paragraph length from longest to shortest:

HD whole ordered 2 Continue reading “Digital Humanities Sandbox Goes to the Congo”

Swarm Intelligence

I just finished watching this great BBC documentary about swarm intelligence. Ignoring the presenter’s attempt to inspire fear in us mere humans, with ominous suggestions of a great red fire ant invasion, swarm intelligence is basically the notion that swarms of creatures (such as the aforementioned ant) work as a collective consciousness. It makes intuitive sense: more minds = more processing power. Of course, these species have been shaped by natural selection to function in this eusocial manner, although whether or not we’re discussing inclusive fitness, superorganisms or something else remains outside the programme’s scope. In fact, the term swarm intelligence doesn’t seem to be a conventional term amongst biologists; too many anthropomorphic connotations no doubt.

Sadly, it is was only available on the BBC until 21.39 GMT, today! yesterday. So get watching. Instead, I’ll leave you with the first youtube video I could find about swarm intelligence, which is actually nothing to do with animals and more to do with computing, networks and information management:

Dawkins and his army of mildy irritating atheists

As many of my friends will know, I’m quite a big fan of Richard Dawkins. But for some reason his website seems to spawn an army of what can only be described as mildly irritating atheists. A particular aspect I frequently bemoan is the necessity of some members to contort a fairly innocuous article into some anti-religious rant. Honestly, it’s unceasing. Take for example this article discussing Neanderthals. Just scroll down to the comments and you’ll be greeted with:

It all just gets you thinking about why the Neanderthals died out. I’ll theorize that their bigger brains found our ancestors’ ravings about our divine origins totally hysterical — and the resulting campaign of genocide simply took the poor buggers off guard.

Funny, yes? Well, no, not when the same joke/theme/structure is applied over and over again. I’ll throw out some more, as I don’t want to single out one individual:

I like to imagine science as a massive guillotine, with creationists frantically trying to stick objects to stop it’s progress. Science may be moving somewhat slowly, but nothing can really stop its progress. (From an article about RNA as a precursor for life on Earth.)

At least one commenter was honest enough to give up any pretence of being interested in the article itself:

When you look at the big picture, Obama has a tough road ahead of him and needs to harvest the support of everyone in America, even that large swath of intolerant, evangelical America, who are Americans none the less. I can’t say that I agree with the choice but I understand it. ( About, um, Quantum computing…)

Maybe it’s a running joke? Or perhaps it’s what you should expect from a website run by Richard Dawkins? Personally, I find these comments serve as an effective cure for insomnia: their repetitive nature will guarantee instant sleep, plus you’ll probably learn something new (from the articles at least). Oh, and one more thing, what’s with Dawkins’ dvd covers:

Does this make your skin crawl?
The serene sea, the steely gaze, the god-like pose. Yes, it makes my skin crawl too.