ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Searching for Structure: Appraising the Organisation of Speech Features in wav2vec 2.0 Embeddings

Patrick Cormac English, John D. Kelleher, Julie Carson-Berndsen

Recent advancements in speech recognition have been driven by large transformer models trained on extensive unlabelled speech corpora. These models generate speech representations that potentially encapsulate key speech features, yet the organisation of these features within the model's embedding space and their alignment with phonetic and phonological theories remains unclear. This paper aims to bridge this gap by applying probing methods to explore the structure of phonetic information within embeddings, thereby uncovering linguistically significant relationships within the latent representations. We introduce a novel approach that probes the speech embeddings for independent features and then applies association rule mining to identify relationships and organisational structure within the data. Our research seeks to enhance the understanding of the speech embeddings of transformer models, ultimately contributing to the explainability of these systems.