Today I’ve been learning more about network structure (from Cris Moore) and I’ve applied my poor understanding and overconfidence to find language families from etymology data!

Here’s what I understand so far (see Clauset, Moore, & Newman, 2008): The modularity of a network is a measure of how many ‘communities’ it has. An optimal modularity will split the graph to maximise the average degree within modules or clusters. You can search all the possible clusterings to find this optimum. I’m still hazy on how this is actually done, and you can extend this to find hierarchies like phylogenetics, but without some assumptions. Luckily, there’s a network analysis program called gephi that does this automatically!

Continue reading “Categorising languages through network modularity”