ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Using wav2vec 2.0 for phonetic classification tasks: methodological aspects

Lila Kim, Cédric Gendrot

Self-supervised learning, particularly in the context of speech, has been shown to be effective in a variety of tasks such as speaker recognition and speaker verification. Our research question focuses on the effectiveness of vector representations extracted from shorter versus longer phoneme sequences in detecting nasality. Two distinct approaches were studied: extracting vectors over the duration of the phoneme and taking longer sequences with one second added on each side of the phoneme, then recovering the central part a posteriori. The results show that the models react differently depending on the phone and the speaker, with variability observed at both levels. The long sequence model outperformed the short sequence model by correlating more robustly with nasal airflow.