ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments

Satwik Dutta, Sarah Anne Tao, Jacob C. Reyna, Rebecca Elizabeth Hacker, Dwight W. Irvin, Jay F. Buzhardt, John H.L. Hansen

Monitoring child development in terms of speech/language skills has a long-term impact on their overall growth. As student diversity continues to expand in US classrooms, there is a growing need to benchmark social-communication engagement, both from a teacher-student perspective, as well as student-student content. Given various challenges with direct observation, deploying speech technology will assist in extracting meaningful information for teachers. These will help teachers to identify and respond to students in need, immediately impacting their early learning and interest. This study takes a deep dive into exploring various hybrid ASR solutions for low-resource spontaneous preschool (3-5yrs) children (with & without developmental delays) speech, being involved in various activities, and interacting with teachers and peers in naturalistic classrooms. Various out-of-domain corpora over a wide and limited age range, both scripted and spontaneous were considered. Acoustic models based on factorized TDNNs infused with Attention, and both N-gram and RNN language models were considered. Results indicate that young children have significantly different/developing articulation skills as compared to older children. Out-of-domain transcripts of interactions between young children and adults however enhance language model performance. Overall transcription of such data, including various non-linguistic markers, poses additional challenges.