ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Comparison of GIF- and SSL-based Features in Pathological-voice Detection

Akira Sasou, Yang Chen

A system that automatically detects voice pathology from acoustic signals enables non-invasive, low cost, and objective assessment of speech disorders. Therefore, it is expected to accelerate and improve the diagnosis and clinical treatment of patients. Pathological voices are symptoms of impairments in the articulation of speech sound, fluency, and/or voice. We consider that direct extraction of features from the glottal flow estimated by glottal inverse filtering (GIF) is a promising approach to pathological-voice detection. To precisely estimate the glottal flow, we propose a novel GIF method that combines constrained autoregressive hidden Markov model (CAR–HMM) analysis with automatic topology generation of the excitation HMM. To evaluate the effectiveness of the features extracted from the estimated glottal flow during pathological-voice detection, we employ the Saarbrücken Voice Database. We also compare the features obtained by the proposed CAR–HMM with those obtained by pre-trained models based on self-supervised learning (SSL). The experimental results confirmed that the CAR–HMM-based method can outperform the SSL-based methods.