ISCA Archive DiSS 2003
ISCA Archive DiSS 2003

A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models

Martine Adda-Decker, Benoît Habert, Claude Barras, Gilles Adda, Philippe Boula de Mareuil, Patrick Paroubek

The aim of this study is to elaborate a disfluent speech model by comparing different types of audio iranscripts. The study makes use of 10 hours of French radio interview archives, involving journalists and personalities from political or civil society. A first type of transcripts is press-oriented where most disfluencies are discarded. For 10% of the corpus, we produced exact audio transcripts: all audible phenomena and overlapping speech segments are transcribed manually. In these iranscripts about 14% of the words correspond to disfluencies and discourse markers. The audio corpus has then been iranscribed using the LIMSI speech recognizer. With 8% of the corpus the disfluency words explain 12% of the overall error rate. This shows that disfluencies have no major effect on neighboring speech segments. Restarts are the most error prone, with a 36.9% within class error rate.