ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Variable Span disfluency detection in ASR transcripts

Rahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang, Shrikanth S. Narayanan

Natural conversations often involve disfluencies in the form of revisions, repetitions, interjections, filled pauses and such. This paper focuses on word/phrase repetitions and revisions that are lexically well formed. These are generally captured by an ASR but pose problems to downstream processing such as spoken language translation (SLT). We describe a system to identify such word level disfluencies with a goal towards removing them in real time from the automatic recognition (ASR) system output. We use a span based training system to utilize the contextual information while tagging disfluencies. We design our system on the oracle transcripts and test them on both reference and ASR transcripts. We achieve an area under the receiver operating characteristics (ROC) curve for word level disfluency detection of .93 and .87 for the reference and the ASR transcripts respectively.

doi: 10.21437/Interspeech.2014-600

Cite as: Gupta, R., Ananthakrishnan, S., Yang, Z., Narayanan, S.S. (2014) Variable Span disfluency detection in ASR transcripts. Proc. Interspeech 2014, 2892-2896, doi: 10.21437/Interspeech.2014-600

  author={Rahul Gupta and Sankaranarayanan Ananthakrishnan and Zhaojun Yang and Shrikanth S. Narayanan},
  title={{Variable Span disfluency detection in ASR transcripts}},
  booktitle={Proc. Interspeech 2014},