ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Improved Contextualized Speech Representations for Tonal Analysis

Jiahong Yuan, Xingyu Cai, Kenneth Church

We propose fine-tuning wav2vec2.0 with a cross-entropy loss to classify tones in an utterance on a frame-by-frame basis. Our study demonstrates that this approach not only improves tone classification accuracy but also generates frame-level representations suitable for tonal analysis. By using these representations, we established that the third-tone-sandhi-rising tone in Mandarin speech differs from the lexical rising tone, and the third tone that doesn't undergo sandhi differs from the third tone that's not in a sandhi context. Our findings suggest that third-tone sandhi in Mandarin Chinese involves a continuous shift from Tone3 to Tone2, rather than a categorical change.