Systematic reviews 101: Internal and External Validity

Who remembers last summer when I started writing a series of posts on systematic literature reviews?

I apologise for neglecting it for so long, but here is a quick write up on assessing the studies you are including in your review for internal and external validity, with special reference to experiments in artificial language learning and evolutionary linguistics (though this is relevant to any field which aspires to adopt scientific method).

In the first post in the series, I outlined the differences between narrative and systematic reviews. One of the defining features of a systematic review is that it is not written with a specific hypothesis in mind. The literature search (which my next post will be about) is conducted with predefined inclusion criteria and, as a result, you will end up with a pile of studies to review regardless of there conclusion, or indeed regardless of there quality. Due to a lack of a filter to catch bad science, we need methods to assess the quality of a study or experiment which is what this post will be about.

(This will also help with DESIGNING a valid experiment, as well as assessing the validity of other people’s.)

What is validity?

Validity is the extent to which a conclusion is a well-founded one given the design and analysis of an experiment. It comes in two different flavours: external validity and internal validity.

External Validity

External validity is the extent to which the results of an experiment or study can be extrapolated to different situations. This is EXTREMELY important in the case of experiments in evolutionary linguistics because the whole point of experiments in evolutionary linguistics is to extrapolate your results to different situations (i.e. the emergence of linguistic structure in our ancestors), and we don’t have access to our ancestors to experiment on.

Here are some of things that effect an experiment’s external validity (in linguistics/psychology):

  • Participant characteristics (age (especially important in language learning experiments), gender, etc.)
  • Sample size
  • Type of learning/training (important in artificial language learning experiments)
  • Characteristics of the input (e.g. the nature of the structure in an input language)
  • Modality of the artificial language (how similar to actual linguistic modalities?)
  • Modality of output measures (how the outcome was measured and analysed)
  • The task from which the output was produced (straightforward imitation or communication or some other task)

Internal Validity

Internal validity is how well an experiment reduces its own systematic error within the circumstances of the experiment being performed.

Here are some of things that effect an experiment’s internal validity:

  •  Selection bias (who’s doing the experiment and who gets put in which condition)
  • Performance bias (differences between conditions other than the ones of interest, e.g. running people in condition one in the morning and condition two in the afternoon)
  • Detection bias (how the outcomes measures are coded and interpreted, blinding which condition a participant is in before coding is paramount to reduce the researcher’s bias to want to find a difference between conditions. A lot of retractions lately have been down to failures to act against detection bias.)
  • Attrition bias (Ignoring drop-outs, especially if one condition is especially stressful, causing high drop-out rates and therefore bias in the participants who completed it. This probably isn’t a big problem in most evolutionary linguistics research, but may be in other psychological stuff.)

Different types of bias will be relevant to different fields of research and different research questions, so it may be an idea to come up with your own scoring method for validity to subject different studies to within your review. But remember to be explicit about what your scoring methods are, and the pros and cons of the studies you are writing about.

Hopefully this introduction will have helped you think about validity within experiments in what you’re interested in, and helped you take an objective view on assessing the quality of studies you are reviewing, or indeed conducting.