ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children’s Automatic Speech Recognition Adaptation

Thomas Rolland, Alberto Abad

Automatic Speech Recognition (ASR) encounters unique challenges when dealing with children's speech, mainly due to the scarcity of available data. Training large ASR models with constrained data presents a significant challenge. To address this, fine-tuning strategy is frequently employed. However, fine-tuning an entire large pre-trained model with limited children's speech data may overfit leading to decreased performance. This study offers a granular evaluation of children's ASR fine-tuning, departing from conventional whole-network tunning. We present a partial fine-tuning approach spotlighting the importance of the Encoder and Feedforward Neural Network modules in Transformer-based models. Remarkably, this method surpasses the efficacy of whole-model fine-tuning, with a relative word error rate improvement of 9\% when dealing with limited data. Our findings highlight the critical role of partial fine-tuning in advancing children's ASR model development.