ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Are Articulatory Feature Overlaps Shrouded in Speech Embeddings?

Erfan A. Shams, Iona Gessinger, Patrick Cormac English, Julie Carson-Berndsen

Domain-informed probing can offer important insights into the types of phonetic information encoded in transformer-based speech recognition models. This paper focuses on phonetic feature probes and investigates whether feature spreading and assimilation are evident in the speech embeddings of the transformer model. Probes are trained for place and manner of articulation and voicing features according to the IPA feature classification, and exemplar fricative consonant clusters where local assimilation would be expected are selected. By then following the articulation trajectory of all of the features during inference, we explore how the transformer model encodes coarticulation and transitions between sounds in the latent representations, by tracking not only the features with the highest activation value but also alternative activations. The patterns identified appear to be in line with expectations from the literature and demonstrate the explanatory power of such an approach.