ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Noise-matched training of CRF based sentence end detection models

Madina Hasan, Rama Doddipatla, Thomas Hain

Sentence end detection (SED) is an important task for many applications and has been studied on written text and automatic speech recognition (ASR) transcripts. In previous work it was shown that conditional random fields models gave best SED performance on a range of tasks, with and without the inclusion of prosodic features. So far, true transcripts were used for both training and evaluation of SED models. However, in the context of noisy ASR transcripts the performance degrades significantly, especially at medium to high ASR error rates. In this work we demonstrate the correlation of SED performance with word error rate (WER), at different ASR system performance levels. A new method is introduced for transferring SED labels onto noisy ASR transcripts for model training of noise-matched SED models. The proposed method significantly improves the performance of SED models, and provides 11% relative gain in slot error rate when compared with models trained on true transcripts. This paper further investigates the effect of noise-matched trained SED with different features. It is observed that the impact of textual features reduces significantly with low ASR performance. However, prosodic features still have noticeable impact.