ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

CBA-Whisper: Curriculum Learning-Based AdaLoRA Fine-Tuning on Whisper for Low-Resource Dysarthric Speech Recognition

Tianyi Tan, Xinan Chen, Xiaohuai Le, Wenzhi Fan, Xianjun Xia, Chuanzeng Huang, Jing Lu

Whisper is a powerful automatic speech recognition (ASR) model. However, its zero-shot performance on low-resource speech requires further improvement, especially in dysarthric speech recognition (DSR). This paper addresses the Interspeech 2025 Speech Accessibility Project Challenge (SAPC) by fine-tuning Whisper large-v2 on SAP 2024-04-30, UA-Speech, and TORGO datasets using adaptive low-rank adaptation (AdaLoRA). We incorporate improved WhisperX processing and rule-based postprocessing designed for stuttering and machine hallucination. In addition, we employ curriculum learning (CL) with adaptively optimized data filtering to progressively enhance the performance of our model. Using less than 1/5 of the official training data, our final system ranked 2nd in this challenge, with a WER of 10.51% and a SemScore of 85.50% on Test 2 Split, reducing WER by 41.02% and improving SemScore by 12.72% over the baseline (WER 17.82%, SemScore 75.85%). Code and models are publicly available.