ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Sorpus

Astik Biswas, Abdelmoumene Boumadane, Stephane Peillon, Gildas Bleas

Advancements in speech technology have led to the integration of modern ASR systems into various applications such as chatbots, medical dictation, video transcription etc. Conversational ASR training requires speech that captures the acoustic cues of spontaneous speech. With its 30k hours of conversational speech, the People's Speech corpus is the largest available spontaneous and conversational corpus and an invaluable resource for such training. In addition, it comes with a commercial friendly license. The corpus is packaged in uniform 15-second segments, but this can lead to abrupt cutting off of speech and transcription that is not always accurate. This paper presents an effective method for automatic data mining from a small subset of 973 raw original records used by the People's Speech corpus. The paper also proposes an approach for outlier detection and automatic data curation. Results show a 19.7% relative improvement in WER compared to the original segments.