ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Challenges and practical guidelines for atypical speech data collection, annotation, usage and sharing: A multi-project perspective

Zhengjun Yue, Mara Barberis, Tanvina Patel, Judith Dineley, Willemijn Doedens, Lottie Stipdonk, YuanYuan Zhang, Elke de Witte, Erfan Loweimi, Hugo Van hamme, Djaina Satoer, Marina Ruiter, Laureano Moro Velazquez, Nicholas Cummins, Odette Scharenborg

Speech technologies have advanced significantly, yet they remain largely trained on typical speech, limiting their applicability to individuals with speech and language impairments. A key obstacle is the lack of well-annotated and representative atypical speech corpora. This paper conducts a multi-project survey and shares the first-hand experience on the challenges of collecting, annotating, using, and sharing atypical speech data. Experiences from seven research projects on collecting atypical speech data, involving both academic and clinical perspectives, are reported and potential issues are discussed. Furthermore, the paper provides practical guidelines that allow for standardisation and harmonisation of data collection practices, which are crucial to allow studies to be compared, replicated, and validated, which is essential for developing more inclusive and effective speech technologies.