The idea is to take some real data, then throw away Occam’s razor and go after the most complicated explanation possible.
I foresee no problems.
The idea is to take some real data, then throw away Occam’s razor and go after the most complicated explanation possible.
I foresee no problems.
As soon as I finished up my series of posts about Matt Jockers, Macroanalysis: Digital Methods & Literary History, I set up a file on my Mac for further thoughts, knowing full well I’d keep thinking about the book. I’ve now posted the first of those continuing thoughts at 3 Quarks Daily: Macroanalysis and the Directional Evolution of Nineteenth Century English-Language Novels.
The issue is cultural evolution, a notion that Jockers flirts with, but rejects. Of course I’ve been committed to the idea for a long time and I’ve decided that his data, that is, the patterns he’s found in his data, constitute a very strong argument of conceptualizing literary history as an evolutionary phenomenon. That’s what my 3QD post is about, a fairly detailed (a handful of new visualizations) reanalysis of Jockers’ account of literary influence.
From Influence to Evolution
It is one thing to track influence among a handful of texts; that is the ordinary business of traditional literary history. You read the texts, look for similar passages and motifs, read correspondence and diaries by the authors, and so forth, and arrive at judgements about how the author of some later text was influenced by authors of earlier texts. It’s not practical to do that for over 3000 texts, most of which you’ve never read, nor has anyone read many or even most them in over 100 years.
Here, in brief, is what Jockers did: He assumed that, if Author X was influenced by Author Q, then X’s texts would be very similar to Q’s. Given the work he’d already done on stylistic and thematic features, it was easy for Jockers to combine those features into a single list comprising almost 600 features. With each text scored on all of those features it was then relatively easy for Jockers to calculate the similarity between texts and represent it in a directed graph where texts are represented by nodes and similarity by the edges between nodes. The length of the edge between two texts is proportional to their similarity.
Note, however, that when Jockers created the graph, he did not include all possible edges. With 3346 nodes in the graph, the full graph where each node is connected to all of the others would have contained millions of edges and been all but impossible to deal with. Jockers reasoned that only where a pair of books was highly similar could one reasonably conjecture and influence from the older to the newer. So he culled all edges below a certain threshold, leaving the final graph with only 165,770 edges (p. 163).
When Jockers visualized the graph (using Force Atlas 2 in the Gephi) he found, much to his delight, that the graph was laid out roughly in temporal order from left to right. And yet, as he points out, there is no date information in the data itself, only information about some 600 stylist and thematic features of the novels. What I argue in my 3QD post is that that in itself is evidence that 19th century literary culture constitutes an evolutionary system. That’s what you would expect if literary change were an evolutionary process. Continue reading
This year’s Nacht van Kunst en Kennis Science Festival in Leiden features an experiment on language evolution. Come and take part in our interactive iterated learning experiment at the Museum Boerhaave from 19:30 on Saturday 20th September.
You can read more about the experiment at the Taal in de reageerbuis page.
Back in 2010 I wrote a piece for the National Humanities Center (USA), Cultural Evolution A Vehicle for Cooperative Interaction Between the Sciences and the Humanities, which is online at their Forum along with comments. I have since revised it to include a section on Jockers, Macroanalysis: Digital Methods in Literary History (2013). You can download the revised version from my SSN page. I’ve placed the added section below.
* * * * *
A Start: 19th Century Anglophone Literary Culture
Let me set the stage by quoting a passage from the excellent review Tim Lewens (2014) wrote for the Stanford Encyclopedia of Philosophy:
The prima-facie case for cultural evolutionary theories is irresistible. Members of our own species are able to survive and reproduce in part because of habits, know-how and technology that are not only maintained by learning from others, they are initially generated as part of a cumulative project that builds on discoveries made by others. And our own species also contains sub-groups with different habits, know-how and technologies, which are once again generated and maintained through social learning. The question is not so much whether cultural evolution is important, but how theories of cultural evolution should be fashioned, and how they should be related to more traditional understandings of organic evolution.
Building on discoveries made by others, we can see that kind of process in a graphic that Matthew Jockers used late in Macroanalysis: Digital Methods in Literary History (2013), though that’s not what Jockers had in mind in that particular investigation. He was working with a corpus of 3346 Ninetheenth Century novels by American, British, Irish and Scottish authors and was interested in tracking influence among them. It is one thing to track influence among a handful of texts; that is the ordinary business of traditional literary history. You read the texts, look for similar passages and motifs, read correspondence and diaries by the authors, and so forth, and arrive at judgements about how the author of some later text was influenced by authors of earlier texts.
It’s not practical to do that for over 3000 texts, most of which you’ve never read, nor has anyone read them in over 100 years. Jockers was using recently developed techniques for analyzing “big data,” in this case, a pile of 19th Century Anglophone novels. Without going into the details – you can find most of them in Jockers, pp. 156 ff.) – Jockers had the computer ‘measure’ each text on almost 600 different traits and then calculated the pair-wise similarity of all the texts. He then tossed out all values below a certain relatively high threshold and then had the computer create a network visualization of the remaining connections. Each text is represented as a ‘node’ in the network and the similarity between two texts is represented by the ‘edge’ (of link) connecting them. The length of the edge is proportional to the degree of similarity. Jockers then had the computer create a visualization of this network, where each text would be next to similar texts in the resulting image. Here’s that image (Figure 9.3 in the book, p. 165, color version from the web):
It turns out that the visualization routine laid the graph out more or less in chronological order, going from older to newer, left to right. Note that there was no temporal information in the data from which that graph was derived (pp. 164-65):
The fact that they line up in a chronological manner is incidental, but rather extraordinary. The chronological alignment reveals that thematic and stylistic change does occur over time. The themes that writers employ and the high-frequency function words they use to build the frameworks for their themes are nearly, but not always, tethered in time. At this macro scale, style and theme are observed to evolve chronologically, and most books and authors in this network cluster into communities with their chronological peers. Not every book and not every author is a slave to his or her epoch.
On Jockers’ first sentence, it’s neither incidental nor extraordinary IF an evolutionary process regulates cultural change. For evolution proceeds through “descent with modification,” as Darwin put it, and that goes for cultural as well as biological evolution. If a later individual is modified from its immediate predecessors, it will in fact resemble them a great deal; the modifications do not change the basic character of the descendants.
As his language indicates, Jockers wasn’t looking for THAT result. It surprised him. Though he alludes to cultural evolution here and there in the book, he rejected it as a basic premise of his investigation (pp. 171-172). The evolutionary interpretation is mine, not his.
We must further realize that that interpretation is an assertion about the collective mentality. Jockers wasn’t examining the minds of millions of 19th century readers of English-language novels in Britain and America, but the history of those novels is a function of the tastes and interests of those readers. Those books wouldn’t have been written if publishers didn’t think they could see them to the public. Those tastes changed gradually, with the themes and styles of novels appealing to those tastes changing gradually as well.
The study of cultural evolution is thus the study of collective mentality. We are interested in the collective psyche. How can we think of the collective psyche without falling into hopeless mysticism?
Matthew L. Jockers. Macroanalysis: Digital Methods & Literary History. University of Illinois Press, 2013. x + 192 pp. ISBN 978-0252-07907-8
I’ve compiled all the posts into a working paper. HERE’s the SSRN link. Abstract and introduction below.
* * * * *
Abstract: Macroanalysis is a statistical study of a corpus of 3346 19th Century American, British, Irish, and Scottish novels. Jockers investigates metatdata; the stylometrics of authorship, gender, genre, and national origin; themes, using a 500 item topic model; and influence, developing a graph model of the entire corpus in a 578 dimensional feature space. I recast his model in terms of cultural evolution where the dynamics are those of blind variation and selective retention. Texts become phenotypical objects, words become genetic objects, and genres become species-like objects. The genetic elements combine and recombine in authors’ minds but they are substantially blind to audience preferences. Audiences determine whether or not a text remains alive in society.
* * * * *
Introduction: Get in the Driver’s Seat
I knew it was going to be good. But not THIS good. A better formulation: I didn’t know it would good in THIS way, that it would put me in driver’s seat, if only in a limited way.
The driver’s seat, you ask, what do you mean? In this case it means that I could actively work with the data. When, for example, I read Moretti’s Graphs, Maps, Trees, I read it as I do pretty much any book, though this one had a bunch of charts and diagrams, which is unusual for literary criticism. There wasn’t anything for me to do other than just read.
If I didn’t have ready access to the web, reading Macroanalysis would have been the same. But I do have web access and I use it all the time. So, when I got to Chapter 8, “Theme,” I also accessed the topic browser that Jockers had put on the web. Through this browser I could explore the topic model Jockers used in the book and, in particular, I could use it to investigate matters that Jockers hadn’t considered.
So I moved from thinking about Jockers’ work to using his work for my own intellectual ends. I ended up writing four posts (6.1 – 6.4) on that material totaling almost 12,000 words and I don’t know how many charts and graphs, all of which I got from Jockers’ web site. Once I’d worked through an initial curiosity about a spike that looked like Call of the Wild (but wasn’t, because that text isn’t in the database) I settled into some explorations framed by Leslie Fiedler’s Love and Death in the American Novel, Melville’s Moby Dick, and Edward Said’s anxiety on behalf of the autonomous existence of the aesthetic realm.
Data is Independent of Interpretations
You can do that as well, or whatever you wish. While the web browser gives you only limited access to Jockers’ corpus, that access is real and useful. A lot of work in digital criticism, and digital humanities in general, is like that. It produces ‘knowledge utilities’ that are generally useful, not just the private preserves of the original investigator.
There is an important epistemological point here as well. Jockers was led to this work by a certain set of intellectual concerns. Some of those concerns are quite general–about literature and the novel–while others are more specific–he has a particular interest in Irish and Irish-American literature. But I had no trouble putting his results to use in service of my own somewhat different interests. Continue reading
The purpose of this post is to recast the work reported in Macroanalysis: Digital Methods & Literary History in terms appropriate to cultural evolution. The idea is to propose a model of cultural evolution and assign objects from Jockerss analysis to play roles in that model. I will leave Jockers’ work untouched. All I’m doing is reframing it.
Before doing that, however, I should note that in the last quarter of a century or so there has been quite a lot of work on cultural evolution in a variety of discipline including linguistics, anthropology, archaeology, and biology. Though it must be done at some time, I have no intention of even attempting to review that work here and so to place the scheme I propose in relation to it. That’s a job for another time and another venue. I note, however, that I have done quite a bit of work on cultural evolution myself and that some of that discussion can be found in documents I list at the end of this post.
First of all, why bother to recast the processes of literary history in evolutionary terms at all? Jockers wrote an excellent book without creating an evolutionary model, though he mentioned evolution here and there. What’s to be gained by this recasting?
As far as I can tell, much of the work that has been done on cultural evolution has been undertaken simply to exercise and extend the range of evolutionary discourse. It has not, as yet, resulted in an understanding of cultural process that is deeper than more conventional forms of historical discourse. Much of my own work has been undertaken in this spirit. I believe that, yes, at some point, evolutionary explanation will prove more robust that other forms of explanation, but we’re not there yet.
This work in effect is looking to evolutionary accounts as exhibiting something like formal cause in Aristotle’s sense. Evolutionary accounts are about distribution of traits across populations. In biology such accounts have a characteristic formal appearance so that, e.g. phylogenetic analysis of a population of entities tends to “look” a certain way. So, in the cultural sphere, let’s conduct a similar analysis and see how things look even if we don’t have our entities embedded in the kind of causal framework that genetics and population biology, molecular biology, and developmental biology provide the biologist.
That’s fine, as long as we remind ourselves periodically that that’s what we’re doing. But we must keep looking for the terms in which to construct a causal model.
What I specifically want from an evolutionary approach to culture is
That’s what I want. Those requirements imply having a causal model. Whether or not I’ll get it, that’s another matter.
Current critical approaches, however, in which individual humans are but nodal points in the machinations of vast and impersonal hegemonic forces, have trouble on all these points. Individual human beings are deprived of agency thus turning readers into zombies watching the ghosts of dead authors flicker on the remaining walls of Plato’s cave. The canon is captive to those same hegemonic forces, which have promulgated Shelley’s defense as an opiate for the masses, which R’ us.
The critical machine is broken. It’s time to start over. Before we do that, however, I need to dispense with one objection to seeking an evolutionary account of cultural phenomena. Continue reading
Over at New Savanna I’ve been blogging my way though Matthew Jockers, Macroanalysis: Digital Methods & Literary History, University of Illinois Press, 2013. I figured this particular post would be of interest here. If you’re not familiar wiht topic analysis, there’s some links below that’ll help you out.
Chapter 8 of Macroanalysis is about “Theme.” Jockers uses topic analysis to investigate the occurrence of 500 ‘themes’ in a corpus of 3,346 19th-century British, American, and Irish books. He opens with a bit of intellectual history, from the Russin Formalists to Google’s Ngrams; then he launches into topic analysis, which emerged at the turn of the millennium he gives some simple examples, and then he gets serious. But I’m going to skip over all of that for now.
For one thing, I’ve been through the topic analysis drill several times in the past year or so and don’t want to go through it again. If you need an introduction or a review, check out Topic Models: Strange Objects, New Worlds, or, in this series, Reading Macroanalysis 5: An Interlude on Scale: Micro, Meso, and Macro. For another, Jockers has put a topic tool online, 500 Themes from a corpus of 19th-Century Fiction. Those are the topics he discusses in this chapter.
Once I was done reading the chapter I started playing with the tool. I’d pick a topic and then look at the graphics:
At first I was just browsing, moving from one theme to the next. But then I hit one that grabbed my attention.
So I spent the next couple of hours looking at themes and thinking about them. I’m going to devote the rest of this post and the next one showing what I found. Then I’ll do a third post where I review what Jockers found and recast the enterprise in terms of cultural evolution.
Note that in all of this I’m just playing around, but in a serious way. It is all preliminary and provisional. I haven’t reached any firm conclusions on the particular themes I look at. The only thing I’m sure about is that this, and similar techniques, are going to revolutionize the way we do literary history.
Before proceeding on, however, two caveats are necessary. While the Jockers’ is substantial it isn’t every British, American, and Irish novel written in the 19th Century. Perhaps more important, it is natural to read these theme charts as reflecting the interests of the 19th Century reading public. And in some sense that is so. But we have to be careful.
For some of these books were more widely read than others and a few of them, the canonical ones, are still being read. But the extent of a books’ readership is not reflected in the data. The fact that a book was published at all implies, of course, that someone thought there was an audience for it. But a publisher’s interest isn’t quite the same as a reader’s interest. We simply don’t know how accurately publisher interest tracks reader interest. With those reservations in mind, let’s take a look.
Of Dogs and Gold
In the course of browsing through Jockers’ themes menu I saw “DOGS.” Let’s look at that, I thought. Why dogs? you may ask. No deep reason, but some years ago, way back in graduate school in fact, I’d noticed that dogs figured as a significant motif in Wuthering Heights. Major transitions among humans were marked by violence between dogs and humans (e.g. Lockwood arrives and is greeted by a barking dog, Catherine gets bitten by Skulker; see this post). More recently, I’d read a handful of articles about the domestication of dogs during human evolutionary history. I was just curious.
Here’s the word cloud for the DOGS topic:
The following graph stunned me. It depicts the occurrence of the dog topic by author’s gender over the course of the century. The medium gray line depicts male authors, the black line females, and the light gray line, authors where the gender was undetermined.
What’s that spike at the right edge? As soon as I realized that it was for male authors I thought, “Jack London, Call of the Wild.” I also had some doubts as to whether that book was in the corpus, as I didn’t believe the book was 19th Century, though I wasn’t sure. But that doubt didn’t stop me from nosing around. By the time I’d confirmed for myself that it wasn’t 19th Century (it was published in 1903) and Jockers had gotten back to me that, no, it wasn’t in the corpus, I’d already had too much fun browsing through the charts and had moved on to other topics (which I’ll get to in the next post). Continue reading
Languages can use pitch to make lexical contrasts (so called tone languages) or to mark contrasts at the utterance level, usually called intonation, such as using rising pitch to indicate a question as opposed to a statement. In fact, a language can use pitch to do both by various means such as changes in pitch range. However, lexical tone and intonation are often seen as mechanisms that compete for pitch resources. Yip (2002) holds that “it is commonplace that many lexical tone languages avoid the potential conflicts between intonation and lexical tone by using a different mechanism altogether: the sentence-final particle.”
Can we see the evolutionary effects of this dependency in the typology of the world’s languages? (at the very least, the terminology is in competition! I’ll use ‘intonation’ to mean phrase-level pitch)
The ARC centre of excellence for the Dynamics of language is offering a number of PhD positions, including on the topic of language evolution. The positions are hosted at ANU in Canberra, the University of Melbourne and the University of Queensland. These are on top of the Wellsprings of Diversity positions.
From the website:
The Evolution program will engage with central questions about the evolution of language across scales that range from the whole span of human evolution to the adaptations that occur as speech capacities are lost in speech-impaired individuals. This program will explore what possible structures languages can develop, how learning and processing biases shape the direction of evolution, what is the role of the speech community in language evolution, and how insights from language evolution can help develop more flexible ways of robots learning speech.
Details can be found in the pdfs below.
Articles from the first edition of the Annual Review of Linguistics are appearing online this week. Bob Ladd, Dan Dediu and I wrote a review of correlations in linguistics.
We review a number of recent studies that have identified either correlations between different linguistic features (e.g., implicational universals) or correlations between linguistic features and nonlinguistic properties of speakers or their environment (e.g., effects of geography on vocabulary). We compare large-scale quantitative studies with more traditional theoretical and historical linguistic research and identify divergent assumptions and methods that have led linguists to be skeptical of correlational work. We also attempt to demystify statistical techniques and point out the importance of informed critiques of the validity of statistical approaches. Finally, we describe various methods used in recent correlational studies to deal with the fact that, because of contact and historical relatedness, individual languages in a sample rarely represent independent data points, and we show how these methods may allow us to explore linguistic prehistory to a greater time depth than is possible with orthodox comparative reconstruction. Whether researchers are for or against these new techniques, understanding them is becoming increasingly necessary to interface with discussions in the field.
One of the most fun parts of putting the paper together was drawing this diagram (below) of all the links that we discuss. It turns out that there are a lot of complicated links between linguistic and social variables! I’m currently working on methods to disentangle this web.
We also include three appendices as supplementary materials. First, a list of electronic databases relevant for cross-cultural statistical comparisons. Secondly, a very brief introduction to statistical hypothesis testing, which could be useful for linguists who are not familiar with statistical approaches. Thirdly, a discussion of robustness and validity in statistical approaches to linguistics.
Other reviews also look interesting, for example, Johansson on Language abilities of Neandertals, Fisher and Vernes on genetics and linguistics, de Vos on village sign languages and Kroll et al. on bilingualism.
Ladd, D. R., Roberts, S. G., and Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1(1). preview