ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Analysis of n-best output hypotheses for fast speech in large vocabulary continuous speech recognition

Tibor Fábián, Thilo Pfau, Günther Ruske

The performance of speech recognition systems often deteriorate considerably with fast speech. Particularly when the recognizer is run in mismatched conditions, e.g. fast speech, the performance can be improved by properly selecting one of the N-best recognition output hypotheses. For the selection of the best hypothesis, different speech rate measures were taken into account. First, to show the potential of the speech rate as a selection criterion, an "ideal" speech rate value is assumed, which is calculated from the known transcription. Phoneme and vowel rate are compared. Second, a phoneme recognizer is used to estimate the speaking rates of unknown sentences. Tests on the spontaneously spoken German Verbmobil material showed a relative decrease of 6.6% in the word error rate for fast speech, when taking the estimated vowel rate which is almost as good as using the "ideal" vowel rate (relative improvement of 7.64%). The most accurate match of N-best output hypotheses shows that the word error rate could ideally be decreased by 26.75%.