ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Spelling as a complementary strategy for speech recognition

Keith Vertanen, Per Ola Kristensson

We compare a variety of strategies for incorporating spelling to create more robust voice-only speech interfaces. These strategies use different combinations of speaking the word, spelling the word, and spelling the word using a phonetic alphabet. For correcting a single recognition error, spelling the word or speaking and spelling the word reduced error rates substantially. Phonetic-spelling was very accurate with error rates on a 5K task approaching zero. Most importantly, multiple input strategies could be used simultaneously with only a modest degradation in performance compared to allowing only a single input strategy. Thus our work shows that spelling-based input strategies offer the potential of a simple, natural and effective way for users to both avoid and correct recognition errors.

Index Terms: speech recognition, error correction