Digital Humanities Sandbox Goes to the Congo, Part II
While Kurtz is the center of attention in Heart of Darkness, he doesn’t appear until relatively late in the story. He isn’t mentioned until about 8000 words into the 38000 word text nor do we know much about him until a long paragraph that starts roughly 23,000 words into the text. That paragraph, which I’ve called the nexus, is structurally central to the text, and is roughly 1500 words long.
I decided to investigated Kurtz’s presence in the text by the simple expedient of noting where the name “Kurtz” occurs. The result, my colleague Tim Perper subsequently told me, is what’s called a periodogram (PDF):
Figure 1: Periodicity in the appearance of “Kurtz”
What I Did (aka Method)
1) Working in MSWord, first counted the number of times “Kurtz” appeared in the text by doing a search and replace on “Kurtz,” which told me how many items been replaced. Result: 122. I did not attempt to identify pronominal references to Kurtz or any more sophisticated reference to him.
2) I then searched for each occurrence of “Kurtz” and noted the word number in a spreadsheet. I found 122 occurrences, as expected.
3) I tried graphing the raw occurrences of Kurtz over the interval 0 – 38500. That gave me too many dots bunched together to see anything. So, I divided the interval into bins of 500 and counted the occurrences in each bin, and graphed the result as a column chart (red), which you see in Figure 1 above. I also added a trend line (black).
4) At Tim Perper’s suggestion I made a minimal check on the possibility that my result had more to do with bin size than with any pattern in the occurrence of “Kurtz.” Tim suggested that I pick the average paragraph length as a basic bin size, then double, triple, and so on. The average paragraph length is 198, alas, too small for my error-prone manual counting. So I chose a bin of roughly three times that, 600. Here’s the result:
a href=”http://www.flickr.com/photos/stc4blues/5972191962/” title=”HoD600 by STC4blues, on Flickr”>
Figure 2, Check on bin size
“Kurtz” appears in the text of Heart of Darkness at periodic intervals. There is a short cycle of roughly 2000 words and a longer one that divides the text into four sections: an initial section with no appearances, a second section with relatively low activity, and a third section with more activity. The last, rather short, section returns to a low level of activity; we must, however, be careful in interpreting this aspect of the results (see discussion).
Implicit in this investigation that is the notion that we can treat individual words as units of time, and that short words, medium, and long words all count the same. This assumption may not be sufficiently valid for the above results to hold. Still, reading is an activity that occurs in time. In the absence of an awkward laboratory procedure where one tracks a reader’s eye movements as they read the text so that we know exactly what is being read, and when, it seems reasonable to use word occurrence as a proxy for time and attention.
With this caveat in mind, if I’d been graphing the appearance of a leitmotif in a Wagner opera, no one would be surprised that it exhibited periodic behavior. But this is a prose text. Why should a particular motif exhibit periodic behavior?
Because human life exhibits periodic behavior on various time scales?
The so-called nexus paragraph, which I’ve identified as being structurally central, starts at word 23,318 and runs for roughly 1500 words to 24,819. This is the first full discussion of Kurtz, and the biggest single “chunk” of information we have about him. Prior to this point in the text he was just some important but enigmatic fellow we’d being meeting sometime, maybe, in the future.
It is around and about the beginning of the nexus that the Kurtz-count first rises above 3 in Figure 1. The nexus seems to have shifted the ‘baseline’ of the K-factor, as I will call it, so that K-action is overall more frequent from here to just before the end. But cycling continues around that higher baseline.
The drop in apparent activity corresponds to the final discussion between Marlow and the Intended. That discussion, of course, is about Kurtz. The Kurtz-factor would thus be very high at that point despite the fact that his name occurs hardly at all. As the discussion is entirely about Kurtz his name need not be mentioned.
“Kurtz” first appears in the 7500-8000 interval (at 7986 and 7992). I am considering the possibility that the K-factor is active before that point, but it is not being expressed in the text by a mention of “Kurtz.” This possibility is what prompts me to talk of a K-factor, as opposed to just occurrences of “Kurtz.”
I’ve not yet attempted to verify this hypothesis. Any such attempt, obviously, is fraught with the danger that, having hypothesized that the K-factor will show itself in certain intervals, I will proceed to find it there, somehow, anyway, but not in-between those intervals.
Comparison with Paragraph Distribution
In a previous post in this series (and here as well) I looked at the distribution of paragraph lengths as a function of paragraph magnitude and of paragraph order in the text. I note that paragraph length is a purely formal matter and would not seem to have any deep dependence on what is said in the paragraph. By contrast, the appearance of a particular word in the text, in this case, a word that is central to the text in that it is the name of the central figure, is a matter of content. That this element recurs periodically, however, could be considered a formal element as well. Periodicity, after all, is central to the formal analysis of both lyric and dramatic poetry.
It is thus of no little interest that these two logically independent measures should ‘mark’ the same two regions of the text, the nexus and the concluding paragraph.
In the distribution of paragraphs by serial order, the nexus paragraph is the longest one in the text and, indeed, is the apex of a distribution whose envelope is crudely pyramidal in shape. That distribution also showed the concluding conversation between Marlow and the Intended to be the single longest string of short paragraphs in the text.
As discussed above, both of these regions are also marked in the analysis of K-factor periodicity. The nexus paragraph marks the transition to a higher level of activity (that is, Kurtz is mentioned more frequently from that point on), but the short cycle periodicity remains. The concluding conversation marks the return of “Kurtz” appearances to a lower level.
At this point the traditional literary critic might well observe that one doesn’t need these two measures to ascertain that those two regions of the text are important. This critic is, of course, correct. I’d pointed out both regions in my initial post on Heart of Darkness, which I wrote well before I’d serendipitously made these two measurements, or observations if you will. I discussed the final conversation under the rubric of closure and the nexus paragraph under the rubric of temporal displacement. So, one does not need such observations to find such things.
But that’s not the point. One makes measurements/ observations as a prelude to further intellectual activity, activity that may, in the long run, help us to explain why literary texts – or is it simply literary texts of this very high quality? – exhibit these features. Further, it would be a mistake to see the traditional qualitative methods of close analysis as being in competition with the emerging quantitative methods of the digital humanities. This is not a race to see who can be the first one to claim the pot of gold at the end of the rainbow. This is an investigation into the structures and processes of the human mind and culture.
The subject is deep, but obscure. We need any and all sources of insight. How can the quantitative methods of the newer psychologies enhance and extend results obtained by close reading? How can close reading identify textual features that might bear fruit through quantitative investigation? We should think more about such questions and less about intellectual turf guarding.