This study utilizes phonotactic and pitch pattern modeling for automatic assessment of toddlers' language background from short vocalization segments. The experiments are conducted on audio recordings of twelve 25.31 months old USborn and Shanghainese toddlers. Each recording captures a whole-day sound track of an ordinary day in the toddlers' life spent in their natural environment. In a preliminary study, we observed that in spite of the limited presence of linguistic content in the early age child vocalizations, certain phonotactic and prosodic patterns were correlated with the child's language background. In the current effort, we analyze to what extent these language-salient cues can be leveraged in the context of automatic language background classification. Besides a traditional parallel phone recognition with statistical language modeling (PPRLM) and phone recognition with support vector machines (PRSVM), a novel scheme that utilizes pitch patterns (PPSVM) is proposed. The classification results on very short vocalizations (on average less than 3 seconds long) confirm that both phonotactic and prosodic features capture a languagespecific content, reaching equal error rates (EER) of 32.45% for PRSVM, 31.33% for PPSVM, and 29.97% in a fusion of PRSVM and PPSVM systems. The competitive performance of PPSVM suggests that pitch contours carry a significant portion of the language-specific information in toddlers' vocalizations.
Index Terms: language background assessment, toddlers, child vocalization, phonotactic modeling, pitch patterns, PPRLM, PRSVM, PPSVM.