ISCA Archive DiaPro 1999
ISCA Archive DiaPro 1999

Understanding recognition failures in spoken corrections in human-computer dialogue

Gina-Anne Levow

Miscommunication in speech recognition systems is unavoidable, but a detailed characterization of user corrections will enable speech systems to identify when a correction is taking place and to more accurately recognize the content of correction utterances. In this paper we investigate the adaptations of users when they encounter recognition errors in interactions with a voice-in/voice-out spoken language system. In analyzing more than 300 pairs of original and repeat correction utterances, matched on speaker and lexical content, we found overall increases in both utterance and pause duration from original to correction. Here we focus on those adaptations - phonological and durational - that are most likely to adversely impact the accuracy of speech recognizers and serve to explain the observed decrease in recognition accuracy on spoken corrections. We identify serveral phonological shifts from conversational to clear speech style. In addition, we compare the observed durations of user utterances from the field trial to those predicted by a speech recognizer’s underlying model. We determine that while words in all positions may increase in duration in spoken corrections, those in final position are significantly more strongly affected than those in non-final position. Furthermore, we find that divergence from predicted duration was more marked in corrections of misrecognition errors than for those in corrections of rejection errors. These systematic changes argue for a general hierarchical model of pronunciation and duration, that extends beyond the word or sentence level to incorporate higher-level features from discourse or dialogue.