ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Evaluating Automatic Speech Recognition Pipelines for Mandarin-English Bilingual Child Language Assessment in Telehealth

Hongchen Wu, Yao Du, Zirong Li, Yixin Gu, Disha Thotappala Jayaprakash, Li Sheng

Bilingualism is rising worldwide, yet bilingual child assessments face major challenges. A shortage of bilingual clinicians and the labor-intensive nature of speech data annotation often cause misdiagnoses, delaying care and research. Using a Mandarin-English adult-child speech dataset (53 telehealth sessions), we explore how speech models can automate the annotation of clinical data involving multi-languages, multi-speakers, children's speech, and code-switching utterances. Findings indicated that simple pre-processing improves automatic speech recognition (ASR) accuracy. Specifically, integrating speaker diarization with OpenAI’s Whisper medium model reduces word error rates to 35% for child speech and 30% for code-switching, rivaling fine-tuned transformer models. As the first ASR pipeline evaluation for a Mandarin-English clinical dataset, our study highlights model limitations, establishes a benchmark for bilingual speech technology, and improves clinical services.