ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Fine-tuning Parakeet-TDT for Dysarthric Speech Recognition in the Speech Accessibility Project Challenge

Kaito Takahashi, Keigo Hojo, Toshimitsu Sakai, Yukoh Wakabayashi, Norihide Kitaoka

We present our dysarthric speech recognition system submitted to the Interspeech 2025 Speech Accessibility Project Challenge. This challenge is a competition aimed at improving the recognition accuracy of dysarthric speech. In this challenge, we submitted a speech recognition system with high accuracy for dysarthric speech and achieved first place. In dysarthric speech recognition, models based on self-supervised learning are commonly used. However, we hypothesized that fine-tuning pre-trained model with inherently high recognition accuracy would achieve the best performance. To this end, we developed a model that combines various techniques, including data preprocessing to expand training datasets, data augmentation to enhance generalization during fine-tuning, and decoding acceleration to optimize inference speed. As a result, our speech recognizer greatly improved accuracy, reducing WER to 8.11 from the baseline Whisper large v2's 17.82 provided by the organizers.