ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Measuring Language Development From Child-centered Recordings

Yaya Sy, William N. Havard, Marvin Lavechin, Emmanuel Dupoux, Alejandrina Cristia

Standard ways to measure child language development from spontaneous corpora rely on detailed linguistic descriptions of a language as well as exhaustive transcriptions of the child's speech, which today can only be done through costly human labor. We tackle both issues by proposing (1) a new language development metric (based on entropy) that does not require linguistic knowledge other than having a corpus of text in the language in question to train a language model, (2) a method to derive this metric directly from speech based on a smaller text-speech parallel corpus. Here, we present descriptive results on an open archive including data from six English-learning children as a proof of concept. We document that our entropy metric documents a gradual convergence of children's speech towards adults' speech as a function of age, and it also correlates moderately with lexical and morphosyntactic measures derived from morphologically-parsed transcriptions.