Fun Language Experiment: Results

Two days ago I ran a pilot experiment online from Replicated Typo.  Thanks to all who took part. It's a bit cheeky to exploit our readers, but it's all in the name of science.  Unfortunately, the pilot was a complete failure. Suggestions and comments are welcome.

The experiment was into the role of variation in language learning.  Here's what I was up to (plus the source code for running similar experiments):

Language learners must learn to detect conditioned variation at several levels of abstraction in their language input (phonetic, phonological, lexical, syntactic etc.).  Although there may be much variation between speakers at lower levels (e.g. accent) the more abstract levels (e.g. lexical, syntactic) are set by consensus between speakers.  Receiving input from multiple speakers may help learners by hi-lighting relevant variation.  For example, receiving input from one speaker, it may not be clear which variation is idiosyncratic to the speaker and which is systematic for the population.  By comparing variation between speakers, variation which is conditioned by the context, and not the speaker, is made more obvious.  Specifically, higher-order conditioned variation should be easier to infer from multiple speakers.  This might also be related to the reported bilingual advantage in metalinguistic awareness (e.g. Bialystok, 1991).

Gomez (2002) shows that increased variation in adjacent dependencies (probabilities of words occurring next to each other) can cause learners to pick up on long-distance dependencies (probabilities of words occurring in the same sentence, but with other words in between).  I hypothesise that input from multiple speakers gives better cues to higher-order dependencies than input from one speaker alone.


The pilot was based on Gomez (2002).  Participants listened to strings from an artificial language with consistent long-range dependencies spoken by either one or two speakers.  They were then tested on strings they had previously heard and novel strings that violated the dependencies.  The prediction was that listeners exposed to two speakers would learn the long-range dependencies better, and therefore score higher in the test phase.

Two artificial languages were created using the following grammar:

S -> aXg
S -> bXh
S -> dXi
S -> eXj
S -> fXk
S -> aXh
S -> bXi
S -> dXj
S -> eXk
S -> fXg

Where a = 'pel', b='rot', c ='dak', d = 'rud', e = 'jic', f = 'tood', g = 'chila', h = 'plizet, i= 'feenam', j = 'deecha'.  X was drawn from five two-syllable words ('wadim', 'fengle', 'coomo', 'loga', 'gople').  These words come from Gomez (2002).  This gives a total of 25 strings in each langauge.  The languages have the same adjacent dependencies, but different long-distance dependencies.

Two speakers (one male, one female with different regional accents) were recorded saying each of the strings, with a short pause between each word and a one second pause between each string.


20 Participants were recruited from the web.  Participants were told that they would be listening to sentences from an artificial language.  They were told that they would be tested on the language, but no specifics were given.

There were two training conditions.  In the one-speaker condition, participants heard the 25 strings from L1 spoken by the same speaker (different recordings of the same string for each repetition).  In the two-speaker condition, participants heard the same sequence of strings, but each alternative sentence was spoken by a different speaker.

Participants then listened to twelve strings from the same speaker (in both conditions), six of which came from L1 and six of which came from L2.  The order of strings was randomised.  Participants indicated after each string whether they had heard it before or not.  The number of correct answers was recorded.


Participants in the two conditions did not score significantly differently from each other or from chance (mean for one-person condition = 53%, mean for two-person condition = 52%, t = 0.2, p = 0.8).  In both conditions, participants scored higher on true positives than true negatives.  This suggests that they thought all the presented sentences were familiar.  Participants in the two-person condition did score marginally higher on false positives (mean for one-person = 23%, two-person = 35%, t = 1.5, p = 0.16).  That is, they were better at detecting sentences that they had not heard before.


It's embarrassing to put up negative results, but that's what I get for doing it all in the public arena.

This pilot follows a previous one which had fewer 'words'.  The results were similar, but a few participants scored perfectly, so I made this pilot more difficult.  It seems as if I made it too difficult.  The participants in Gomez (2002) had 18 minutes of training for only three long distance dependencies, while the ones in this experiment had about one and a half minutes of training for five long distance dependencies.  I decided that I couldn't trust participants that were recruited online to listen to more than a few minutes of training data.

I'm not convinced that the experiment won't work, but it's back to the drawing board.  Any suggestions or comments are most welcome.


Gómez RL (2002). Variability and detection of invariant structure. Psychological science : a journal of the American Psychological Society / APS, 13 (5), 431-6 PMID: 12219809

Source Code for Experiment

The following php code creates a form with a hidden element which stores the progress of the participant (number of rounds completed, order of stimuli, time spent doing the experiment etc.).  The script then sends this information and the participant's button-presses to itself. The results are written to a file on a serverAdd the html and php tags around this source code:

$fileRoot = '';  # folder location of this file
 $prefix = 'sounds/Test'; # location and prefix of sound files e.g. ./sounds/Test1.mp3
 $nameOfThisFile = 'test3.php';  # name of this file (used for sending data to itself)
 $numTests = 12;
 $numConditions = 4;
 function player($fname){
 $playerA = '<embed type="application/x-shockwave-flash" flashvars="audioUrl=';

 $playerB = '" src="" width="400" height="27" quality="best"></embed>';

 echo $playerA.$fname.$playerB;
 function getCondition(){
 global $numConditions;
 $file = fopen("results/log.txt",'r');
 $contents =fread($file,filesize("results/log.txt"));
 return substr_count($contents,"\n") % $numConditions ;
 return 0;
 #return rand(0,$numConditions-1);  # random first condition

function getRealIpAddr()
 if (!empty($_SERVER['HTTP_CLIENT_IP']))   //check ip from share internet
 elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR']))   //to check ip is pass from proxy
 return $ip;

 $prev = "";
 $order = array();
 $firstTime = false;
 $pointer = 0;
 $result = "";
 $condition = 0;
 if(isset($_POST['prev'])) {
 $prev = explode(";",$_POST['prev']);
 $pointer = (int) $prev[0];
 $order = explode(",",$prev[1]);
 $result = $prev[2];
 $condition = (int) $prev[3];
 $timeStamp = (int) $prev[4];
 $timer = ((int)$prev[5])+(time() - $timeStamp);

 if(isset($_POST['Yes'])) {
 $result = $result."Y";
 if(isset($_POST['No'])) {
 $result = $result."N";
 $firstTime = true;
 $order = range(1,$numTests);
 $condition = getCondition();

 $timeStamp = strval(time());
 $pointer = strlen($result);
 $orderStr = implode(",",$order);
 $prev = strval($pointer).";".$orderStr.";".$result.";".$condition.";".$timeStamp.";".$timer.";".$ip;
 # echo '<br />'.$order[$pointer];

 $file = fopen("results/log.txt",'a+');
 echo 'Thank you for taking part in my experiment!';
 echo '<br /><br />Back to <a href="">Replicated Typo</a>.';
 $prev= '';
 echo '<font size="14">Language Experiment</font> <br /><br />';


 $conditionFile = 'sounds/A.mp3';
 if($condition == 1){
 $conditionFile = 'sounds/B.mp3';
 if($condition == 2){
 $conditionFile = 'sounds/C.mp3';
 if($condition == 3){
 $conditionFile = 'sounds/D.mp3';

 echo '<b>This Experiment uses sound - turn on your speakers or plug in your headphones!</b><br /><br />';

 echo "This experiment tests your memory.  Don't take notes!<br /><br />";
 echo 'First, you will listen to a number of sentences from an alien language. <br />Listen carefully because you will be tested on them. <br /> Press the play button below and listen <b>only once</b>.<br /> Listen <b>all the way through</b>.<br /><br />Then press "To test" to begin the test. <br />';
 echo '<br />';
 echo '<b>The alien language:</b>';
 echo '<br />';
 echo 'TEST<br />';
 echo 'Recordings of sentences will appear below.  Each sentence may be one you have already heard, or it may be from a different language.<br />';
 echo 'For each sentence, press play to listen to it.  <br />Listen to each sentence <b>ONLY ONCE</b>.<br />';
 echo "Then indicate whether you think you've heard it before.<br />";
 echo '<br />There are 12 test sentences.';
 echo '<br /><br />';
 if($pointer % 2==($condition%2)){
 echo '<div style="background-color:#C0C0C0;width:500px;padding:10px;">';
 echo '<div style="background-color:#8C8C8C;width:500px;padding:10px;">';
 echo '<b>Sentence '.strval($pointer+1).'</b>';
 echo '<br />';

 echo '<br /><br />Have you heard this sentence before?<br />';


 echo '      <form action="'.$nameOfThisFile.'" method="post">';  # action is name of this file

 echo '      <input type="hidden" name="prev" value="'.$prev.'">';
 echo '<br />';
 echo '<input type="submit" name="OK" value="TO TEST ->">';
 echo '      <input type="submit" name="Yes" value="Yes">';
 echo '   <input type="submit" name="No" value="No">';
 echo '</div>';
 echo '</form>';
  • Ed

    Negative results aren't embarrassing, they're informative. That you couldn't get them published is a shortcoming of the journal system: researchers should be able to find out what's been tried unsuccessfully by others.

    I must have been tested in the one-speaker condition. Fwiw, I assumed this was a standard implicit learning task, so I was looking for the underlying FSG. Didn't notice that word C was always determined by word A - now that's embarrassing.

  • Is it critical to do the training all at once? I was a little surprised by that feature. What if you broke up the training. There would be the same amount, but two sessions, each followed by testing?

  • Marc

    What would be most publishable is probably a replication of exactly Gomez's study followed by your extension.
    You might also want to consider this:
    Richtsmeier, Peter T, Louann Gerken, Lisa Goffman, and Tiffany Hogan. 2009. Statistical frequency in perception affects children’s lexical production. Cognition 111, no. 3 (June): 372-7. doi:10.1016/j.cognition.2009.02.009.
    It talks about variation there.

  • Thanks for the feedback -
    I have considered having several test rounds, but I was worried about participants learning on test. I guess since there are relatively few test items, it wouldn't be such a big effect.


    Regarding a replication, that's a good idea. However, in one condition in Gomez (2002), all participants score perfectly! This might be hard to replicate. Some colleagues are also doing partial replications of this study with mixed results.

    Thanks for the link to the paper - it's exactly what I was looking for!

    It's also been suggested to me that I should make the experiment easier, but the X words should be divided between the two speakers so that they don't overlap. This would mean that the adjacent dependencies would be idiosyncratic (for the two-speaker condition) while the long-distance dependencies would stand out as being in common.

  • Fantastic post! I definitely consent.

  • CH

    "It’s embarrassing to put up negative results, but that’s what I get for doing it all in the public arena." Hardly - well done studies that lead to neg results advance science.

    I participated in your experiment and I have to say that i found the placement of the buttons a little vexing. I kept my mouse on the "yes" button and found myself quickly clicking it, thinking it would start the voice. But, of course, it takes this as an answer and moves onto to the next voice. I was in a bit of a rush and I know this led to ~5 false positives. I wonder how many other people might have accidentally done this?

  • Thanks for the tips- yes, I was worried about the effect that you mention. The 'real' experiment will be more tightly controlled.

  • If you're interested, Croft recently did an experiment looking at the production of synchronic variation in relation to grammaticalisation. It's somewhat removed from what you are discussing here, but I'm sure it can be adapted to look at diachronic variation and the emergence of contentive/functional lexicon from a monolithic lexicon (Simon recently posted a paper on this very topic). Anyway, here it is: The origins of grammaticalization in the verbalization of experience