ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Estimating word-stability during incremental speech recognition

Ian McGraw, Alexander Gruenstein

Many speech user interfaces can be improved by incrementally displaying or interpreting a speech recognizer's current best path as a user speaks. This gives rise to a problem of instability, whereby the best path may change frequently, particularly with respect to the words most recently spoken. Introducing a lag between the audio most recently processed and the portion of the best path shown to the user can lead to a more usable incremental results. In the ideal case, the lag introduced would vary to recover exactly the longest stable prefix of the best path. In this paper, we introduce a framework for estimating a stability statistic for each word, and explore the tradeoff of stability and lag by thresholding stability statistics estimated using a variety of features.