Santiago Barreda: Vowel Normalization and Social Characteristics

by Jordan (JC) Crowley
November 14, 2025

Santiago Barreda is primarily interested in answering question such as "how do we know someone sound like a man or woman?" or "how can we guess how tall someone sounds based just on their voice?". In order to begin answering these questions, one must first start to resolve some of the problems that arise with typical data normalization methods, most notably of which is the (un)intentional erasure of important social characteristics like gender, age, and social status, that are inherently different between different populations. For example, when comparing the vowel spaces of typical men and women, traditional Lobanov normalization erroneously elides important speaker variation.

His current primary research project involves a reassessment and implementation of a uniform normalization method for vowel spaces developed by Nearey and Assmann in 2007, called the Probabilistic Sliding Template Model (PSTM). In this model, it is assumed fairly that the listener has a generally good understanding of how the vowels of a certain demographic typically sound. For example, a California English speaker typically places their vowels in a predictable pattern. Using this information, the model normalizes the vowel spaces using log-mean uniform scaling and provides each speaker within that community a 'psi' factor that represents the size of the speaker and the degree of normalization that is needed to preserve that underlying variation present across the populations.

Santiago is also very interested in the important distinction between acoustic data and phonetic data, of which the differences are rarely addressed, leading to constant disagreements and misunderstandings in the phonetics literature. Precise definitions of what exactly ought to be considered 'phonetic' information, and where the line be drawn is a somewhat tumultuous. He believes that what constitutes phonetic information must be represented at the perceptual level, rather than the production level. For example, phoneme categorization rapidly loses efficacy because the variability in how repeated sound tokens are perceived does not align with the rigid, low-granularity structure of discrete phonetic categories. In other words, the mapping from continuous acoustic input to phonetic representations fails to correspond neatly with the categorical distinctions of phonemes or allophones.

Finally, Santiago is heavily involved in regularly updating his software FastTrack, a Praat plugin for semi-automatic formant tracking. For more information about any of this work, please reach out to him!