Doublets Network Analysis Reveals the Complexity of the English Language

By Alex Russell - Patrick Farrell, a professor of linguistics, recently wrapped up a joint project with statistics professor Fushing Hsieh that used a language game from the 1870s to build a visual representation of the English language as a network.
Doublets Network Analysis Reveals the Complexity of the English Language

Words two steps from "MARE"

Doublets, or Word Ladders, is a puzzle game invented by Lewis Carroll in the 1870s. It requires that you convert one word into another that has same number of letters, changing only one letter at a time. The catch is that on the way each new word must actually exist in the English language.

Here is Farrell discussing the project and what it tells us about the English language.

What motivated you to pursue this project?

Networks as entities exist all over the place. In English, the sound system combined with the spelling system has an intricate complexity reflected in the structure of the network that solves the game. It’s interesting to work with statisticians and to figure out how they are interested not so much in the word games but the statistical networks.

What has been the most surprising finding on this project?

For this project, the network visualization is a metaphor for the grammar of sounds and how you put the pieces together to make larger words. The network visualization is one aspect of the overall language, a way to visualize our knowledge of a language. It is an authentic network for something that exists in the world in a delimited way like the English language. When you map out the English language as a network you can see the beautiful complexity that reflects the pattern in the language itself that has developed over centuries.

What do you see as the biggest research opportunity right now in your field?

The field of linguistics research is going to be moving in general toward the type of research networks people now have access to through huge databases of information about languages. Google and the structure of the Internet, Google Books and others make it possible to answer old questions and to ask new interesting questions. That amount of data available and the technology to create images out of that data makes this possible.

What do you think are some interesting opportunities for interdisciplinary collaborations?

Data scientists, statisticians and linguists can now collaborate more to better map data visually. In this way, statistical technology makes it possible to understand the world better.

Who else’s research on campus are you excited about?

I am very interested in Jack Hawkins’ work on the typology of languages and how word order works and how languages change over time. His work on how people process languages and comprehend language promises to shed light on what causes languages to change.