ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Stuttering Detection Based on Self-Attention Weights of Temporal Acoustic Vector Sequence

Genzo Miyahara, Tsuneo Kato, Akihiro Tamura

Stuttering detection is gaining attention to enable automatic monitoring of the condition of persons who stutter and to develop inclusive automatic speech recognition. Stuttering has various symptoms, i.e., repetitions of sounds, syllables, or words, prolongation of sounds, blocks, and interjections, and the acoustic properties are highly individual. Although detection performances have been improved by introducing deep learning techniques and creating larger datasets of stuttered speech, the datasets are still much smaller than those of normal speech, which leads to overfitting of the models. We propose a self-attention weight feature of a temporal acoustic vector sequence. This feature can efficiently extract the temporal structure specific to stuttered speech with less regard for the varied acoustic properties. Experimental results showed that a model utilizing multi-layer self-attention weight features of wav2vec 2.0 outperformed a previous attention-based model using wav2vec 2.0.