ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Phonetic Posteriorgram-Based Phoneme Selection for Vocal Cord Disorder Classification in Continuous Mandarin Speech

Chih-Ning Chen, Yu-Lan Chuang, Ming-Jhang Yang, Wei-Cheng Hsu, Yung-An Tsou, Yi-Wen Liu

Automatic classification of vocal cord disorders (VCDs) in dysphonia benefits society by enabling home screening when immediate clinical consultation is unavailable. Previous studies focused on VCD classification using single vowels or isolated words. This research advances the field by classifying VCDs from continuous speech, using data from diagnosed patients. By parsing continuous speech into phonemes based on phonetic posteriorgrams (PPGs), we investigated VCD classification using the Mel frequency cepstral coefficients (MFCCs) corresponding to each Mandarin phoneme as the features. Results show a 15% accuracy improvement over a baseline model that ignores phonetic context and a prior study on the same patients using single-word utterances and sustained vowels. Our findings enhance the understanding of phonetic characteristics in VCDs and underscore the significance of continuous speech in automatic classification.