ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Discovering Phonetic Feature Event Patterns in Transformer Embeddings

Patrick Cormac English, John D. Kelleher, Julie Carson-Berndsen

Domain-informed probing of large speech recognition transformer-based models offers an opportunity to investigate how phonetic information is captured and transformed in the information-rich embeddings that emerge as part of the recognition process. Previous work in this area has established the efficacy of probing these embeddings with simple multi-layer perceptron models to identify the information patterns encoded at each layer of the transformer. This paper explores phonetic feature event patterns which evolve at each layer of a transformer model. Probing models are trained with phonetic embeddings, which are averaged and labelled at the phone level using the TIMIT dataset, to detect the presence of certain phonetic features in time-steps of a speech signal. This paper demonstrates how the detection of phonetic features within the embeddings of transformer models, such as voicing, frication and nasal, provides insights in relation to the encoding of speech patterns in these models.