ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Improving Automatic Speech Recognition for Children's Reading Assessment with Disfluency-aware Language Models

Jazmín Vidal, Luciana Ferrer, Juan Esteban Kamienkowski, Pablo Riera

We present an approach to improve automatic speech recognition (ASR) quality for children’s reading assessment in Spanish by incorporating disfluencies into the decoding process. Existing ASR-based assessment methods use hybrid or end-to-end models, with some methods relying on task-specific lattices or n-gram models for modeling potentially disfluent speech. In this work, we compare both families of approaches, in combination with in-domain fine-tuning of the acoustic model, on a dataset of Spanish-speaking children reading aloud. The task-specific n-gram model is learned on a synthetic dataset of disfluent text generated automatically from the reference text with our disfluentES toolkit. Results show that modeling disfluencies improves ASR performance as well as fluency assessment performance.