ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

A semi-automatic pipeline for transcribing and segmenting child speech

Polychronia Christodoulidou, James Tanner, Jane Stuart-Smith, Michael McAuliffe, Mridhula Murali, Amy Smith, Lauren Taylor, Joanne Cleland, Anja Kuschmann

This study evaluates both automated transcription (WhisperX) and forced alignment (MFA) in developing a semi-automated pipeline for obtaining acoustic vowel measures from field recordings from 275 children speaking a non-standard, English dialect, Scottish English. As expected, manual correction of speech transcriptions before forced alignment improves the quality of acoustic vowel measures with respect to manually-annotated data, though speech style and recording environment present some challenges for both tools. Adaptation of the MFA pre-trained english_us_arpa acoustic model towards the children's speech also improves the quality of acoustic measures, though greater improvement was not found by increasing training sample size.