A few weeks ago, Roger Blench gave a talk at the MPI entitled ‘New mathematical methods’ in linguistics constitute the greatest intellectual fraud in the discipline since Chomsky. The title is controversial, to say the least! The talk argued, amongst other things, that phylogenetic methods are less transparent and less replicatable than traditional historical reconstruction. Here are I argue against those points.
Having attended the talk, the title is clearly tongue-in-cheek. The talk was quite light-hearted and he was very easy to talk to. He certainly didn’t accuse any particular researcher of actual fraud (“fraud” only appears in the title, not in the actual talk). By “fraud”, he seems to mean “overblown claims not supported by evidence” (and he is not alone in this view of phylogenetics). However, I was surprised to find the talk slides were made public, where things can be taken out of context.
It was a fun talk to be at, because it made clear to me why some people distrust phylogenetic methods. Roger Blench made 3 basic points (the points were made in detail with examples, but I will summarise them here). He argued that phylogenetic methods were useless because they (1) were not replicatable and (2) were not transparent. Regarding (1), he argued that changes to parameters or data can lead to differences in results (though, to me, this might be better characterised as a lack of robustness rather than a lack of replicatability), and it’s impossible to tell which are better. Furthermore, the method itself is opaque and mysterious, because it relies on cognacy judgements which are quite subjective (he gave an example of one historical linguist who claimed that it was much easier to find cognates after lunch!). That is, garbage in, garbage out.
He also argued that many phylogenetic studies don’t try to validate their trees by aligning them with anthropological or archaelological data (and I agree than several studies could be improved by doing such validation). That is, the graphs look pretty, but to an expert linguist they don’t reveal anything new. More specifically, they offer limited falsifiability. Worse, they may actually mislead the reader if the visualisations do not reflect actual processes of change etc. Blench argued that the pretty graphs have ‘bamboozled’ the editors and reviewers of high-impact journals (who do not have expertise in linguistics) into accepting the studies.
Blench stressed that he is not anti-quantitative (nor a Luddite, indeed, I talked with him afterwords about a quantitative method for tracing borrowings for which he had several good insights) , but suggested that he feels like he hasn’t learned anything from phylogenetics.
In my mind, transparency and replicatability are the strong points of phylogenetic techniques. Statistical methods require precise definitions. This includes defining the assumptions behind the analysis, defining the measurements and defining how the assumptions and measures lead to the conclusion (the method). Therefore, another researcher should be able to reproduce the precise results of a study given the same data, assumptions and method. For example, given the same data and parameter settings (and random starting seeds) two researchers could produce precisely the same phylogenetic tree using a Bayesian phylogenetic approach. This means the results can be replicated, an important step in any science.
Furthermore, while some statistical methods may seem opaque without a knowledge of mathematics, they can be precisely communicated. They are arguably more transparent than analyses which were the result of an individual researcher combining deep knowledge of several domains without fully explaining the process. For example, cognacy judgements often rely on a deep knowledge of the language as well as its history and culture and the surrounding geopolitical landscape. These judgements are invaluable for subsequent quantitative work, yet the data and assumptions that go into the judgements are often left implicit. This obscures the research method and also makes it difficult to reproduce the same results. This can lead to disagreements that focus on the skill or authority of the researcher, which is not productive.
In contrast, when assumptions, measures, methods and results are precisely defined, researchers can focus on them directly. For example, if a researcher takes an issue with one of the assumptions, they can make a different assumption, then use the same measures and methods to produce alternative results. The two results can be directly compared to determine whether the assumption has a crucial impact on the results, or whether the alternative assumption leads to better results (e.g. a better ‘fit’ to the data or a more efficient explanation). Researchers can also directly test the impact of certain data or steps in the method. That is, arguments can focus on core scientific elements rather than on opinion or prestige of researchers. In this way, quantitative methods can achieve better transparency and replication in a productive way.
However, this is not to say that qualitative judgements cannot achieve this potential. All that is required is that that the assumptions, methods and underlying measures are precisely defined. For example, while there is a large amount of disagreement in the classification of languages into historical trees, the Glottolog classification has a rigidly defined sequence of judgements that decide how to place a language or dialect on a tree. If you accept the assumptions behind this process, then you should accept the classification. If you disagree with the classification, you should be able to identify either an assumption or a specific judgement that you don’t agree with. You can then bring evidence against this particular assumption or judgement and determine how the classification should change – there is no need to reject the entire tree, nor necessarily any other classification.
In a similar way, the decisions that go into seemingly more subjective measures such as cognacy judgements could be explicitly stated. For example, the LexStat method (List, 2012) is a computational method for identifying cognates based on linguistic criteria and a set of assumptions (implemented in LingPy). It produces replicable judgements which are derived in a transparent way. It’s probable that expert historical linguists would disagree with the results obtained, but rather than dismissing the method, they should be able to define an additional set of data and assumptions which would produce more agreeable results. This might include coding archaeological or anthropological evidence that certain languages were or were not in contact, known population movements or knowledge about certain semantic domains that refer to items that were traded. This is essentially quantifying the knowledge in a way that could be used with the statistical methods. Historical linguists have a wealth of critical knowledge about language, and this could find much broader impact if it were combined with the transparency and reproducibility of quantitative methods.
Having said this, the talk by Blench makes it clear that this kind of synergy is not taking place. Those using mathematical models may have to spend more time justifying and clarifying their work. At the same time, learning the mathematical principles is not so hard. As we argued in our paper on correlational studies, understanding the mathematical methods in linguistics is becoming more relevant not only to conduct research, but engage in debate.
On a positive note, I understand that Blench is working together with an evolutionary biologist to work out a mathematical model which reflects their assumptions and theories about how languages change and diversify. I look forward to seeing this model and how it compares to phylogenetic models.
The talk slides are quite easy to follow if you’d like more detail on Blench’s arguments.