We present an approach to improve automatic speech recognition (ASR) quality for children’s reading assessment in Spanish by incorporating disfluencies into the decoding process. Existing ASR-based assessment methods use hybrid or end-to-end models, with some methods relying on task-specific lattices or n-gram models for modeling potentially disfluent speech. In this work, we compare both families of approaches, in combination with in-domain fine-tuning of the acoustic model, on a dataset of Spanish-speaking children reading aloud. The task-specific n-gram model is learned on a synthetic dataset of disfluent text generated automatically from the reference text with our disfluentES toolkit. Results show that modeling disfluencies improves ASR performance as well as fluency assessment performance.