Corpus Linguistics, Literary Studies, and Description

One of my main hobbyhorses these days is description. Literary studies has to get a lot more sophisticated about description, which is mostly taken for granted and so is not done very rigorously. There isn’t even a sense that there’s something there to be rigorous about. Perhaps corpus linguistics is a way to open up that conversation.
The crucial insight is this: What makes a statement descriptive IS NOT how one arrives at it, but the role it plays in the larger intellectual enterprise.

A Little Background Music

Back in the 1950s there was this notion that the process of aesthetic criticism took the form of a pipeline that started with description, moved on to analysis, then interpretation and finally evaluation. Academic literary practice simply dropped evaluation altogether and concentrated its efforts on interpretation. There were attempts to side-step the difficulties of interpretation by asserting that one is simply describing what’s there. To this Stanley Fish has replied (“What Makes an Interpretation Acceptable?” in Is There a Text in This Class?, Harvard 1980, p. 353):

 

The basic gesture then, is to disavow interpretation in favor of simply presenting the text: but it actually is a gesture in which one set of interpretive principles is replaced by another that happens to claim for itself the virtue of not being an interpretation at all.

 

And that takes care of that.
Except that it doesn’t. Fish is correct in asserting that there’s no such thing as a theory-free description. Literary texts are rich and complicated objects. When the critic picks this or that feature for discussion those choices are done with something in mind. They aren’t innocent.
But, as Michael Bérubé has pointed out in “There is Nothing Inside the Text, or, Why No One’s Heard of Wolfgang Iser” (in Gary Olson and Lynn Worsham, eds. Postmodern Sophistries, SUNY Press 2004, pp. 11-26) there is interpretation and there is interpretation and they’re not alike. The process by which the mind’s eye makes out letters and punctuation marks from ink smudges is interpretive, for example, but it’s rather different from throwing Marx and Freud at a text and coming up with meaning.
Thus I take it that the existence of some kind of interpretive component to any description need not imply that the necessity of interpretation implies that it is impossible to descriptively carve literary texts at their joints. And that’s one of the things that I want from description, to carve texts at their joints.
Of course, one has to know how to do that. And THAT, it would seem, is far from obvious.

Description in Biology

But, then our neighbors, the biologists, faced the same problem. Darwin’s theory of evolution was built on an extensive base of descriptive work. Without those descriptions of flora and fauna and their life ways, Darwin would have had nothing to theorize about. For his theory is based on patterns he discerned in those descriptions.
Not only did it take a couple of centuries for his predecessors to assemble those descriptions, but it took time for them to figure out just how to make those descriptions (see Brian Ogilvy, The Science of Describing: Natural History in Renaissance Europe, 2006). The descriptions, of course, were not merely verbal, but consisted of drawings as well, and both where associated with reference collections.
One of the most important papers in 20th century biology is descriptive. I’m talking about Watson and Crick’s 1953 paper, “The Structurefor Deoxyribose Nucleic Acid” (Nature 171, 737-738). The paper, which includes a simple diagram, simply asserts that the DNA molecule takes the form of a double helix. That’s a descriptive assertion, no more, no less.
And it’s hardly theory-free. You couldn’t just put a DNA molecule on a stand and then photograph it or draw it. You had to crystallize it and then blast the crystal with x-rays directed at a photographic plate. You then develop the images, examine the smudges, and attempt to figure out what kind of geometry would produce just those smudges when blasted by x-rays.

That’s a theory intensive process. And it worked. Watson and Crick succeeded in carving nature at its joints, very small ones invisible to the eye.Much of the descriptive work on which Darwin relied was done by careful naked eye observation, but also comparative analysis of many different specimens. Organisms are complex, having many parts that can be described and, of course, the relationships between those parts. Comparative analysis gave naturalists clues about what features to emphasize in descriptive work and which ones could be ignored. It is through thinking about similarities and differences among species over time and space that Darwin was able to arrive and his account of natural selection.

 

Nothing in molecular biology is done by naked eye observation. All observations are mediated by complex apparatus, apparatus whose theory of operation must be taken into account in interpreting the observational data. Given Watson and Crick’s description of the DNA molecule, it was then possible to think about how biological information can be transmitted and modified. In both cases, Darwin’s theory and cellular replication, descriptions provide the basis on which theorizing can be constructed.

Describing Literary Texts

Can we do that with literary texts, describe them in more revealing ways than we’ve done so far? I believe so, but we’ve got some work to do in order to figure out what to look for. Oh, we do know quite a bit already. We know how to look for rhyme schemes in poetry and how to tell the difference between plot order and story order in a narrative. And so forth. But we need to do more, and to be more systematic.

 

As I indicated at the start, corpus linguistics seems to me a way to get that conversation going. To be sure, it’s not a technique for describing individual texts. It’s not going to do the kind carving I’ve been most concerned with in my own work. It won’t produce the literary equivalent of all those descriptions of flora and fauna which Darwin had at his disposal.
But that’s OK. It may even be an advantage. For it’s the individual texts that critics are most skittish about. Corpus linguistics is thus a way to begin theorizing description without having to worry so much about individual texts.
Why? Because corpus techniques ARE descriptive. They tell you what is there, but it is up to you to make sense of it. And to do that you have to know something about how the description is done. Corpus techniques rely on certain assumptions about how texts are structured. What are those assumptions? What do they imply about language? Corpus techniques require a large body of texts, and they tend to improve as the number of texts gets larger—assuming, of course that  the necessary computing power is available. Why is that?
At the moment, corpus techniques provide the most successful form of machine translation from one language to another. The translations are by no means perfect. They are often distinctly strange and even unintelligible. But they work often enough to be useful for some purposes. But why do they work at all?
If literary critics are going to make fruitful use of corpus techniques for investigating large bodies of text, then they’re going to have to be able to answer those questions. The answers don’t need to be in full technical detail—I certainly can’t do that myself—but they will require more knowledge of linguistics than is currently the norm for literary study.
And they will require some serious thinking about description. Just what is being described when you extract 100 or 150 “topics”from the corpus of PMLA (Publications of the Modern Language Association) articles, as Andrew Goldstone and Ted Underwood have recently done? Why do such topics emerge at all?
Those questions are interesting, and the answers are not simple. If signs in texts really are running in the infinite loops of hide-and-go-seek implied by so much recent thought about texts, then the corpus techniques wouldn’t produce anything at all, much less usable translations. If texts have the kind of structure that can support such statistical methods, then perhaps that structure is rigorous enough to support the close analysis of individual texts that goes beyond the free-form and impressionistic methods of New Critical close reading. Perhaps thoughtful consideration of the underpinnings of corpus techniques will give critics the insight and intuitions needed to re-create close analysis in new and more rigorous ways.

Such work might even constitute and response and riposte to laments that digital humanities lacks theory, which means critical theory, as though that were the only worthwhile kind of theory. It isn’t, but that’s another discussion.

 

Addendum, 12.29.12: After having taken this post to the bathtub this morning I have an addendum.
I want to have a certain kind of discussion about description, one that focuses on individual texts and comparisons between a small number of texts. This is mostly what I had in mind in my long methodological essay on literary morphology, and what I’ve done with, e.g. “Kubla Khan,” “This Lime-Tree Bower My Prison,” and Heart of Darkness. While there’s some interest along those lines, there isn’t, so far as I can tell, very much.
The situation is different for corpus linguistics. There’s a lot of interest, or at least curiosity, about those techniques. For they press on the whole profession, sorta’. But if anything broadly useful is to come of those techniques, then people need to understand how they work, and that requires knowing how language is structured so that such techniques CAN work. Coming to grips with that necessarily entails coming to grips with the phenomena underlying the “handicraft” techniques I want to apply to individual texts and comparisons between those texts.