ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

When Whisper Listens to Aphasia: Advancing Robust Post-Stroke Speech Recognition

Giulia Sanguedolce, Sophie Brook, Dragos C. Gruia, Patrick A. Naylor, Fatemeh Geranmayeh

Despite recent advancements in Automatic Speech Recognition (ASR), its accuracy remains low for pathological speech, thereby limiting AI-based healthcare interventions in such settings. This work addresses this challenge by fine-tuning Whisper, an ASR known for its ability to capture high-dimensional features in healthy speech. Using our comprehensive dataset of patients with stroke, we fine-tuned Whisper and significantly reduced Word Error Rate (WER), surpassing previous work on severe aphasia. To demonstrate its generalisability, we tested the model on a separate database, AphasiaBank, and observed a lower WER despite variations in dialect, linguistics, and test protocols. Our result on the AphasiaBank was superior to previous ASRs trained on this database, confirming the generalisability of our approach. These outcomes not only address ASR limitations in impaired speech but also establish the foundations for standardised and versatile AI solutions for remote speech monitoring for timely diagnosis and intervention.