ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Towards Classification of Typical and Atypical Disfluencies: A Self Supervised Representation Approach

Priyanka Kommagouni, Pragya Khanna, Vamshiraghusimha Narasinga, Anirudh Bocha, Anil Kumar Vuppala

This paper investigates the nuanced distinctions between typical and atypical speech disfluencies, focusing on features captured by self-supervised models. Typical disfluencies are natural, non-pathological irregularities in speech, while atypical disfluencies are linked to speech disorders like stuttering, characterized by more frequent and severe disruptions. Despite progress in automatic disfluency detection, limited research addresses the direct classification of these two types. This study leverages intermediate representations from four pretrained models Wav2Vec2.0, HuBERT, WavLM, and TERA, to analyze and classify typical and stuttered disfluencies. The experiments utilize two novel Indian English datasets, IIITH-IEDE and IIITH-TISA, enabling a comprehensive analysis of disfluency patterns in a linguistically diverse context. Classification experiments with support vector machines (SVM) and convolutional neural networks (CNN) reveal that features from HuBERT’s 5th layer—balancing low-level acoustic and high-level semantic information—achieve a peak F1 score of 0.97. These findings highlight the importance of intermediate layer representations of self-supervised models in distinguishing nuanced speech variations and contribute to robust and interpretable systems for speech disfluency classification.