Developing a robust Automatic Speech Recognition (ASR) sys- tem for children is a challenging task because of increased vari- ability in acoustic and linguistic correlates as function of young age. The acoustic variability is mainly due to the developmen- tal changes associated with vocal tract growth. On the linguis- tic side, the variability is associated with limited knowledge of vocabulary, pronunciations and other linguistic constructs. This paper presents a preliminary study towards better acous- tic modeling, pronunciation modeling and front-end processing for children’s speech. Results are presented as a function of age. Speaker adaptation significantly reduces mismatch and variabil- ity improving recognition results across age groups. In addition, introduction of pronunciation modeling shows promising per- formance improvements.
Index Terms: automatic speech recognition, acoustic model- ing, pronunciation modeling, acoustic adaptation, front-end fea- tures