I've been reading through an earlier draft of my dissertation and noticed a few paragraphs that were omitted due to word length. Despite not making the final cut, it serves as nice reminder about where our data is coming from: that is, when we dive into WALS or UPSID, take a particular inventory and look at one of its phonemes, then we're viewing something that's been ascribed by the investigators/observers of said language. Anyway, it's basically about the Wichí language -- a member of the Matacoan language familyspoken in parts of South America's Chaco region -- and the various reports on its phoneme inventory size. N.B. The source is a PhD thesis by Megan Avram (2008).
Even if we accept the theoretical justification for the concept of a phoneme, then there is still an additional problem of how these representations are measured and recorded. These problems are neatly highlighted in the debates surrounding the Wichí language and its phoneme inventory. For instance, back in 1981 Antonio Tovar published an article showing the Wichí had 22 consonants, whereas if you were to jump forward 13 years to 1994, then Kenneth Claesson's paper would tell you that they are down to just 16 consonants. This is quite a big difference. In WALS terms, Wichí has gone from having an average consonant inventory to a moderately small one. Great news then for those of you searching for a correlation between small communities (Wichí has approximately 25,000 speakers) and phoneme inventory inventory size. Not so great on the reliability front.
Short of conspiracy to bring the number of phonemes down (but see here), reasons for these differences are broad and varied. Some instances could be genuine differences between speech communities in the form of dialectal variation. Other reasons are more likely to be theoretically motivated. Take, as one of many examples, Claesson’s choice to omit glottalized consonants from his description of Wichí. His rationale being that these “are actually consonant clusters of a stop followed by a glottal stop” (Avram, 2008: 37-38). In summary, both sources of data are at the whims of subjectivity: for each language, or dialect, the study is reliant on the choices of potentially one researcher, at a very specific point in time, and with only a finite amount of resources (for a similar discussion, see the comments on Everett and recursion).
It's straight out of phoneme inventories 101, but from time to time these little examples are useful as cautionary tales about the sources of data we often take for granted.