Report on Cultural Evolution for the National Humanities Center, Revised Edition

Back in 2010 I wrote a piece for the National Humanities Center (USA), Cultural Evolution A Vehicle for Cooperative Interaction Between the Sciences and the Humanities, which is online at their Forum along with comments. I have since revised it to include a section on Jockers, Macroanalysis: Digital Methods in Literary History (2013). You can download the revised version from my SSN page. I’ve placed the added section below.

A Start: 19th Century Anglophone Literary Culture

Let me set the stage by quoting a passage from the excellent review Tim Lewens (2014) wrote for the Stanford Encyclopedia of Philosophy:

The prima-facie case for cultural evolutionary theories is irresistible. Members of our own species are able to survive and reproduce in part because of habits, know-how and technology that are not only maintained by learning from others, they are initially generated as part of a cumulative project that builds on discoveries made by others. And our own species also contains sub-groups with different habits, know-how and technologies, which are once again generated and maintained through social learning. The question is not so much whether cultural evolution is important, but how theories of cultural evolution should be fashioned, and how they should be related to more traditional understandings of organic evolution.

Building on discoveries made by others, we can see that kind of process in a graphic that Matthew Jockers used late in Macroanalysis: Digital Methods in Literary History (2013), though that’s not what Jockers had in mind in that particular investigation. He was working with a corpus of 3346 Ninetheenth Century novels by American, British, Irish and Scottish authors and was interested in tracking influence among them. It is one thing to track influence among a handful of texts; that is the ordinary business of traditional literary history. You read the texts, look for similar passages and motifs, read correspondence and diaries by the authors, and so forth, and arrive at judgements about how the author of some later text was influenced by authors of earlier texts.

It’s not practical to do that for over 3000 texts, most of which you’ve never read, nor has anyone read them in over 100 years. Jockers was using recently developed techniques for analyzing “big data,” in this case, a pile of 19th Century Anglophone novels. Without going into the details – you can find most of them in Jockers, pp. 156 ff.) – Jockers had the computer ‘measure’ each text on almost 600 different traits and then calculated the pair-wise similarity of all the texts. He then tossed out all values below a certain relatively high threshold and then had the computer create a network visualization of the remaining connections. Each text is represented as a ‘node’ in the network and the similarity between two texts is represented by the ‘edge’ (of link) connecting them. The length of the edge is proportional to the degree of similarity. Jockers then had the computer create a visualization of this network, where each text would be next to similar texts in the resulting image. Here’s that image (Figure 9.3 in the book, p. 165, color version from the web):


It turns out that the visualization routine laid the graph out more or less in chronological order, going from older to newer, left to right. Note that there was no temporal information in the data from which that graph was derived (pp. 164-65):

The fact that they line up in a chronological manner is incidental, but rather extraordinary. The chronological alignment reveals that thematic and stylistic change does occur over time. The themes that writers employ and the high-frequency function words they use to build the frameworks for their themes are nearly, but not always, tethered in time. At this macro scale, style and theme are observed to evolve chronologically, and most books and authors in this network cluster into communities with their chronological peers. Not every book and not every author is a slave to his or her epoch.

On Jockers’ first sentence, it’s neither incidental nor extraordinary IF an evolutionary process regulates cultural change. For evolution proceeds through “descent with modification,” as Darwin put it, and that goes for cultural as well as biological evolution. If a later individual is modified from its immediate predecessors, it will in fact resemble them a great deal; the modifications do not change the basic character of the descendants.

As his language indicates, Jockers wasn’t looking for THAT result. It surprised him. Though he alludes to cultural evolution here and there in the book, he rejected it as a basic premise of his investigation (pp. 171-172). The evolutionary interpretation is mine, not his.

We must further realize that that interpretation is an assertion about the collective mentality. Jockers wasn’t examining the minds of millions of 19th century readers of English-language novels in Britain and America, but the history of those novels is a function of the tastes and interests of those readers. Those books wouldn’t have been written if publishers didn’t think they could see them to the public. Those tastes changed gradually, with the themes and styles of novels appealing to those tastes changing gradually as well.

The study of cultural evolution is thus the study of collective mentality. We are interested in the collective psyche. How can we think of the collective psyche without falling into hopeless mysticism?

Reading Macroanalysis: Notes on the Evolution of Nineteenth Century Anglo-American Literary Culture

Matthew L. Jockers. Macroanalysis: Digital Methods & Literary History. University of Illinois Press, 2013. x + 192 pp. ISBN 978-0252-07907-8

I’ve compiled all the posts into a working paper. HERE’s the SSRN link. Abstract and introduction below.

Abstract: Macroanalysis is a statistical study of a corpus of 3346 19th Century American, British, Irish, and Scottish novels. Jockers investigates metatdata; the stylometrics of authorship, gender, genre, and national origin; themes, using a 500 item topic model; and influence, developing a graph model of the entire corpus in a 578 dimensional feature space. I recast his model in terms of cultural evolution where the dynamics are those of blind variation and selective retention. Texts become phenotypical objects, words become genetic objects, and genres become species-like objects. The genetic elements combine and recombine in authors’ minds but they are substantially blind to audience preferences. Audiences determine whether or not a text remains alive in society.

Introduction: Get in the Driver’s Seat

I knew it was going to be good. But not THIS good. A better formulation: I didn’t know it would good in THIS way, that it would put me in driver’s seat, if only in a limited way.

The driver’s seat, you ask, what do you mean? In this case it means that I could actively work with the data. When, for example, I read Moretti’s Graphs, Maps, Trees, I read it as I do pretty much any book, though this one had a bunch of charts and diagrams, which is unusual for literary criticism. There wasn’t anything for me to do other than just read.

If I didn’t have ready access to the web, reading Macroanalysis would have been the same. But I do have web access and I use it all the time. So, when I got to Chapter 8, “Theme,” I also accessed the topic browser that Jockers had put on the web. Through this browser I could explore the topic model Jockers used in the book and, in particular, I could use it to investigate matters that Jockers hadn’t considered.

So I moved from thinking about Jockers’ work to using his work for my own intellectual ends. I ended up writing four posts (6.1 – 6.4) on that material totaling almost 12,000 words and I don’t know how many charts and graphs, all of which I got from Jockers’ web site. Once I’d worked through an initial curiosity about a spike that looked like Call of the Wild (but wasn’t, because that text isn’t in the database) I settled into some explorations framed by Leslie Fiedler’s Love and Death in the American Novel, Melville’s Moby Dick, and Edward Said’s anxiety on behalf of the autonomous existence of the aesthetic realm.

Data is Independent of Interpretations

You can do that as well, or whatever you wish. While the web browser gives you only limited access to Jockers’ corpus, that access is real and useful. A lot of work in digital criticism, and digital humanities in general, is like that. It produces ‘knowledge utilities’ that are generally useful, not just the private preserves of the original investigator.

There is an important epistemological point here as well. Jockers was led to this work by a certain set of intellectual concerns. Some of those concerns are quite general–about literature and the novel–while others are more specific–he has a particular interest in Irish and Irish-American literature. But I had no trouble putting his results to use in service of my own somewhat different interests.

From Macroanalysis to Cultural Evolution

The purpose of this post is to recast the work reported in Macroanalysis: Digital Methods & Literary History in terms appropriate to cultural evolution. The idea is to propose a model of cultural evolution and assign objects from Jockerss analysis to play roles in that model. I will leave Jockers’ work untouched. All I’m doing is reframing it.

Before doing that, however, I should note that in the last quarter of a century or so there has been quite a lot of work on cultural evolution in a variety of discipline including linguistics, anthropology, archaeology, and biology. Though it must be done at some time, I have no intention of even attempting to review that work here and so to place the scheme I propose in relation to it. That’s a job for another time and another venue. I note, however, that I have done quite a bit of work on cultural evolution myself and that some of that discussion can be found in documents I list at the end of this post.

Why Evolution?

First of all, why bother to recast the processes of literary history in evolutionary terms at all? Jockers wrote an excellent book without creating an evolutionary model, though he mentioned evolution here and there. What’s to be gained by this recasting?

As far as I can tell, much of the work that has been done on cultural evolution has been undertaken simply to exercise and extend the range of evolutionary discourse. It has not, as yet, resulted in an understanding of cultural process that is deeper than more conventional forms of historical discourse. Much of my own work has been undertaken in this spirit. I believe that, yes, at some point, evolutionary explanation will prove more robust that other forms of explanation, but we’re not there yet.

This work in effect is looking to evolutionary accounts as exhibiting something like formal cause in Aristotle’s sense. Evolutionary accounts are about distribution of traits across populations. In biology such accounts have a characteristic formal appearance so that, e.g. phylogenetic analysis of a population of entities tends to “look” a certain way. So, in the cultural sphere, let’s conduct a similar analysis and see how things look even if we don’t have our entities embedded in the kind of causal framework that genetics and population biology, molecular biology, and developmental biology provide the biologist.

That’s fine, as long as we remind ourselves periodically that that’s what we’re doing. But we must keep looking for the terms in which to construct a causal model.

What I specifically want from an evolutionary approach to culture is

  • a way to think about Said’s autonomous aesthetic realm,
  • a way to prove out Shelley’s assertion that “poets are the unacknowledged legislators of the world,”
  • a way of restoring agency to writers and readers rather than casting them as puppets of various vast and impersonal forces, and
  • a way of thinking about the canon in relation to the whole of literary culture.

That’s what I want. Those requirements imply having a causal model. Whether or not I’ll get it, that’s another matter.

Current critical approaches, however, in which individual humans are but nodal points in the machinations of vast and impersonal hegemonic forces, have trouble on all these points. Individual human beings are deprived of agency thus turning readers into zombies watching the ghosts of dead authors flicker on the remaining walls of Plato’s cave. The canon is captive to those same hegemonic forces, which have promulgated Shelley’s defense as an opiate for the masses, which R’ us.

The critical machine is broken. It's time to start over. Before we do that, however, I need to dispense with one objection to seeking an evolutionary account of cultural phenomena.

Reading Macroanalysis 6.1: Theme–Dogs, Gold, Slavery, and Awakening

Over at New Savanna I’ve been blogging my way though Matthew Jockers, Macroanalysis: Digital Methods & Literary History, University of Illinois Press, 2013. I figured this particular post would be of interest here. If you’re not familiar with topic analysis, there’s some links below that’ll help you out.

Chapter 8 of Macroanalysis is about “Theme.” Jockers uses topic analysis to investigate the occurrence of 500 ‘themes’ in a corpus of 3,346 19th-century British, American, and Irish books. He opens with a bit of intellectual history, from the Russin Formalists to Google’s Ngrams; then he launches into topic analysis, which emerged at the turn of the millennium he gives some simple examples, and then he gets serious. But I’m going to skip over all of that for now.

For one thing, I’ve been through the topic analysis drill several times in the past year or so and don’t want to go through it again. If you need an introduction or a review, check out Topic Models: Strange Objects, New Worlds, or, in this series, Reading Macroanalysis 5: An Interlude on Scale: Micro, Meso, and Macro. For another, Jockers has put a topic tool online, 500 Themes from a corpus of 19th-Century Fiction. Those are the topics he discusses in this chapter.

Once I was done reading the chapter I started playing with the tool. I’d pick a topic and then look at the graphics:

  1. a word cloud to display the most frequent words in the topic,
  2. a bar chart indicating usage of the topic by author gender (male, female, and undetermined),
  3. a line graph showing gender usage over time,
  4. a bar chart indicating usage of topic by author nationality (American, British, Irish).,and
  5. a line graph showing national usage over time.

At first I was just browsing, moving from one theme to the next. But then I hit one that grabbed my attention.

So I spent the next couple of hours looking at themes and thinking about them. I’m going to devote the rest of this post and the next one showing what I found. Then I’ll do a third post where I review what Jockers found and recast the enterprise in terms of cultural evolution.

Note that in all of this I’m just playing around, but in a serious way. It is all preliminary and provisional. I haven’t reached any firm conclusions on the particular themes I look at. The only thing I’m sure about is that this, and similar techniques, are going to revolutionize the way we do literary history.

Before proceeding on, however, two caveats are necessary. While the Jockers’ is substantial it isn’t every British, American, and Irish novel written in the 19th Century. Perhaps more important, it is natural to read these theme charts as reflecting the interests of the 19th Century reading public. And in some sense that is so. But we have to be careful.

For some of these books were more widely read than others and a few of them, the canonical ones, are still being read. But the extent of a books’ readership is not reflected in the data. The fact that a book was published at all implies, of course, that someone thought there was an audience for it. But a publisher’s interest isn’t quite the same as a reader’s interest. We simply don’t know how accurately publisher interest tracks reader interest. With those reservations in mind, let’s take a look.

Of Dogs and Gold

In the course of browsing through Jockers’ themes menu I saw “DOGS.” Let’s look at that, I thought. Why dogs? you may ask. No deep reason, but some years ago, way back in graduate school in fact, I’d noticed that dogs figured as a significant motif in Wuthering Heights. Major transitions among humans were marked by violence between dogs and humans (e.g. Lockwood arrives and is greeted by a barking dog, Catherine gets bitten by Skulker; see this post). More recently, I’d read a handful of articles about the domestication of dogs during human evolutionary history. I was just curious.

Here’s the word cloud for the DOGS topic:

dog cloud

The following graph stunned me. It depicts the occurrence of the dog topic by author’s gender over the course of the century. The medium gray line depicts male authors, the black line females, and the light gray line, authors where the gender was undetermined.

Dogs Gender year

What's that spike at the right edge? As soon as I realized that it was for male authors I thought, "Jack London, Call of the Wild." I also had some doubts as to whether that book was in the corpus, as I didn't believe the book was 19th Century, though I wasn't sure. But that doubt didn't stop me from nosing around. By the time I'd confirmed for myself that it wasn't 19th Century (it was published in 1903) and Jockers had gotten back to me that, no, it wasn't in the corpus, I'd already had too much fun browsing through the charts and had moved on to other topics (which I'll get to in the next post).

Beyond Quantification: Digital Criticism and the Search for Patterns

I’ve collected some recent posts (from New Savanna) on patterns into a working paper. It’s online at SSRN. Here’s the abstract and the introduction.

Abstract: Literary critics seek patterns, whether patterns in individual texts or patterns in large collections of texts. Valid patterns are taken as indices of causal mechanisms of one sort or another. Most abstractly, a pattern emerges or is enacted as some machine makes its way in an environment. An ecological niche is a pattern “traced” by an organism in its environment. Literary texts are themselves patterns traced by writers (and readers) through their life worlds. Patterns are frequently described through visualizations. The concept of pattern thus dissolves the apparent conflict between quantification and meaning, for quantification is but a means to describing a pattern. It is up to the critic to determine whether or not a pattern is meaningful by identifying the mechanism that produced the pattern. Examples from Shakespeare and Joseph Conrad.

Introduction: Patterns and Descriptions There is a sense, of course, in which I’ve been aware of and have been perceiving and thinking about patterns all my life. They are ubiquitous after all. But it wasn’t until I began studying cognitive science with the late David Hays that “pattern” became a term of art. Hays and his students were developing a network model of cognitive structure – such works became common in the 1970s. Such networks admit of two general kinds of computational process, path tracing and pattern recognition. Path tracing is computationally easy, while the pattern recognition is not. Human beings, however, are very good at perceiving and recognizing patterns.

What put the idea before me, though, as something demanding specific thought, are remarks Franco Moretti made in coming to grips with his work on the network analysis of plot structure. In Network Theory, Plot Analysis (Literary Lab Pamphlet 2, 2011, p. 11) he noted that he "did not need network theory; but I probably needed networks…. What I took from network theory were less concepts than visualization." We then examine the visualizations to determine whether or not they indicate patterns that are worth further exploration.

Digital Criticism Comes of Age, a Post at 3QD

I’ve got a new post at 3 Quarks Daily: The Only Game in Town: Digital Criticism Comes of Age.

I open with Moretti – natch – then to Willard McCarty’s 2013 Busa Award Lecture, where he talks of embracing the computer as Other. I end with Said on his belief in an autonomous aesthetic realm, despite the difficulties of conceptualizing how it could possibly work. The thrust of the article, though, is whether or not we can actually get this venture moving, really moving. What are the chances of really embracing the Other?

Though I made my peace with the computer years ago, and so am biased, I don’t know the answer to that question. But I’ve made some progress in figuring out what that question entails and that form the bulk of my essay.

The issue is one that’s been with academic literary study since the early 20th Century. In the 1920s the matter was stated most succinctly by Archibald MacLeish, that poems should not mean but be. In that late 1950s we find ourselves in the “Polemical Introduction” to Northrup Frye’s well-known Anatomy of Criticism (pp. 27-28):

The reading of literature should, like prayer in the Gospels, step out of the talking world of criticism into the private and secret presence of literature. Otherwise the reading will not be a genuine literary experience, but a mere reflection of critical conventions, memories, and prejudices. The presence of incommunicable experienced in center of criticism will always keep criticism as art, as long as the critic recognized that criticism comes out of it but cannot be built on it.

The issue came home to me in a rejection letter for my first essay on “Kubla Khan” – which ended up going into Language and Style in 1985 – where the reviewer complained that the essay “ought to argue with itself, to put into question some of the patterns it establishes-or better, perhaps to let the poem talk back.”

What does he mean, “let the poem talk back”? I know very well that the statement isn’t meant to be taken literally. But what’s the non-literal version of the statement? Under what circumstances could a poem do something like talk back?

Under face-to-face performance circumstances. To be sure, the poem doesn't talk, but the poet does. The poet recites the poem, the teller spins the tale, the audience reacts with silence, groans, laughter, remarks, and the poet replies. There we the poet/story-teller on an even footing, in the same "space," one that really IS interactive. But criticism really isn't like that, no matter how much this or that critic wishes otherwise.

Toward a Computational Historicism. Part 4: Into the Autonomous Aesthetic

This is the fourth and last in a series of posts that began with Discourse and Conceptual Topology, moved to From History to Abstraction, and then Abstraction at the Time Scale of History.

In the 6th pamphlet from Stanford’s Literary Lab, “Operationalizing”: or, the Function of Measurement in Modern Literary Theory, Franco Moretti ended with a call to explicate the theoretical consequences of computing for literary study. That’s what I’ve been doing. It is now time to wrap up the exposition.

Let us begin with a passage from one of the last essays published by Edward Said, Globalizing Literary Study (PMLA, Vol. 116, No. 1, 2001, pp. 64-68). In his second paragraph Said notes: “An increasing number of us, I think, feel that there is something basically unworkable or at least drastically changed about the traditional frameworks in which we study literature“ (p. 64). Agreed. He goes on (pp. 64-65):

I myself have no doubt, for instance, that an autonomous aesthetic realm exists, yet how it exists in relation to history, politics, social structures, and the like, is really difficult to specify. Questions and doubts about all these other relations have eroded the formerly perdurable national and aesthetic frameworks, limits, and boundaries almost completely. The notion neither of author, nor of work, nor of nation is as dependable as it once was, and for that matter the role of imagination, which used to be a central one, along with that of identity has undergone a Copernican transformation in the common understanding of it.

What has happened to all those things, as Alan Liu has noted in "The Meaning of the Digital Humanities" (PMLA 128, 2013, 409-423) is that they have dissolved into vast networks of objects and processes interacting across many different spatial and temporal scales, from the syllables of a haiku dropping into a neural net through the process of rendering ancient texts into movies made in Hollywood, Bollywood, or "Chinawood" (that is, Hengdian, in Zhejiang Province) and shown around the world.

Toward a Computational Historicism. Part 3: Abstraction at the Time Scale of History

Poets are the hierophants of an unapprehended inspiration; the mirrors of the gigantic shadows which humanity casts upon the present; the words which express what they understand not; the trumpets which sign to battle, and feel not what they inspire; the influence which is moved not but moves. Poets are the unacknowledged legislators of the world.
–Percy Bysshe Shelley

In the first post in this series, Discourse and Conceptual Topology, I reviewed network models on three scales, micro, meso, and macro. In the second post, From History to Abstraction, I moved to the micro scale and argued that the mechanism of abstraction proposed by David Hays gives us a way of thinking about how a historical process can lead to subsequent abstraction and illustrated the model through an examination of Shakespeare’s Sonnet 129. In this post I examine Heuser and Le-Khac on the 19th Century British novel and undertake a formal comparison of The Winter’s Tale and Wuthering Heights in which I argue that Brontë had the advantage of conceptual machinery unavailable to Shakespeare, though in some way anticipated by him. I hope to conclude this series with a fourth post in which I return to purely theoretical and methodological matters.

History: Showing and Telling

As we all know, one of the major problems of literary studies up to now is that it has concentrated its attentions on a relatively small body of texts, the so-called canon, and has allowed the examination of those texts to stand as a proxy for all of literary history. The assumption is either that, because of their quality, those are the only texts that matter or, perhaps, their quality allows them to “stand-in” for the rest. The widespread availability of powerful computers now allows as to put these assumptions to the test or, rather, simply to abandon them.

IMGP6658rd

Toward a Computational Historicism. Part II: From History to Abstraction

I examined three different uses of network vizualizations, topic models, Moretti’s plot diagrams, and cognitive networks in first part of this essay, Discourse and Conceptual Topology. When I posted that I imagined only a second part. In the writing, though, that second part grew and grew, so I cut it in two.

In this part I pose the problem of time and discuss two essays by Stephen Greenblatt, “The Cultivation of Anxiety: King Lear and His Heirs” and “Psychoanalysis and Renaissance Culture” and then compare Amleth (Saxo-Grammaticus) with Hamlet (Shakespeare). I then move back to cognitive networks and talk about Hays’s concept of metalingual defintion and conclude with more Shakespeare, Sonnet 129. I’ll get to Heuser and Le-Khac in Part 3: Prophesy.

Time and History

For physics, I understand, time presents a problem. It seems to have a direction, as some processes are irreversible. Why? If you drop a small quantity of ink into a tumbler of water – as I did in A Primer on Self-Organization: With some tabletop physics you can do at home – it diffuses, irreversibly so. The ink particles never collect together into the compact volume they had when first dropped into the water. Why?

IMGP6658rd

Toward a Computational Historicism. Part 1: Discourse and Conceptual Topology

Poets are the unacknowledged legislators of the world.
– Percy Bysshe Shelley

… it is precisely because we are talking about ordinary language that we need to adopt a notation as different from ordinary language as possible, to keep us from getting lost in confusion between the object of description and the means of description.
¬–Sydney Lamb

Worlds within worlds – that’s how Tim Perper, my friend and colleague, described biology. At the smallest scale we have individual molecules, with DNA being of prime importance. At the largest scale we have the earth as a whole, with all living beings interacting in a single ecosystem over billions of years. In between we have cells, tissues, and organs of various sizes, autonomous organisms, populations of organisms on various scales from the invisible to continent-spanning, and interactions among populations of organisms on various scales.

Literature too is like that, from single figures and tropes, even single words (think of Joyce’s portmanteaus) through complete works of various sizes, from haiku to oral epics, from short stories through multi-volume novels, onto whole bodies of literature circulating locally, regionally, across continents and between them, from weeks and years to centuries and millennia. Somehow we as humanists and literary critics must comprehend it all. Breathtaking, no?

In this essay I sketch a potential computational historicism operating at multiple scales, both in time and textual extent. In the first part I consider network models on three scale: 1) topic models at the macroscale, 2) Moretti’s plot networks at the mesoscale, and 3) cognitive networks, taken from computational linguistics, at the microscale. I give examples of each and conclude by sketching relationships among them. I open the second part by presenting an account of abstraction given by David Hays in the early 1970s; in this model abstract concepts are defined over stories. I then move on to Hauser and Le-Khac on 19th Century novels, Stephen Greenblatt on self and person, and consider several texts, Amleth, Hamlet, The Winter’s Tale, Wuthering Heights, and Heart of Darkness.

Graphs and Networks

To the mathematician the image below depicts a topological object called a graph. Civilians tend to call such objects networks. The nodes or vertices, as they are called, are connected by arcs or edges.


Such graphs can be used to represent many different kinds of phenomena, a road map is an obvious example, a kinship tree is another, sentence structure is a third example. The point is that such graphs are signs of phenomena, notations. They are not the phenomena itself.