ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

AusKidTalk: Using Strategic Data Collection and Out-of-Domain Tools to Semi-Automate Novel Corpora Annotation

Tünde Szalay, Mostafa Shahin, Tharmakulasingam Sirojan, Zheng Nan, Renata Huang, Kirrie Ballard, Beena Ahmed

Annotating speech corpora for novel populations presents a circular problem: eliminating costly manual transcription requires automatic speech recognition (ASR) tools not yet developed; but developing ASR tools requires annotated speech corpora not available. Manual transcription burden was reduced for AusKidTalk, a novel population due to speaker age and accent, by strategic data collection protocol combined with out-of-domain ASR tools for semi-automatic annotation. The data collection protocol inserted tones and timestamps to automatically segment the recordings. Automatic annotation was conducted by out-of-domain tools for diarisation (NeMo) and orthographic transcription (UNSW ASR). Transcription accuracy with 17% word error rate (WER) for single words and 23% WER for continuous speech allowed for hand-correction instead of transcription, reducing annotation burden. The workflow can be adapted for other corpora and updated with new ASR tools as they become available.