ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Automatic detection and annotation of disfluencies in spoken French corpora

George Christodoulides, Mathieu Avanzi

In this paper we propose a multi-step system for the semi-automatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We present a detailed coding scheme for simple disfluencies (filled pauses, mispronunciations, false starts, drawls and intra-word pauses), structured disfluencies (repetitions, deletions, substitutions, insertions) and complex disfluencies. The system is trained and evaluated on a transcribed corpus of spontaneous French speech, consisting of 112 different speakers and balanced for speaker age and sex, covering 14 different varieties of French spoken in Belgium, France and Switzerland.