ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

New words: implications for continuous speech recognition

I. Lee Hetherington, Victor W. Zue

The goal of this paper is to understand issues related to the new-word problem in continuous speech recognition, so that we may be able to provide better acoustic and language models to facilitate their detection. We define new words as those outside of the system's vocabulary. Specifically, we present experimental results quantifying the likelihood of encountering new words in several different recognition tasks. We show that the rate of new word occurrence depends on the type of task, and can remain significant even for very large system vocabularies. We also investigate cross-task vocabulary coverage to assess the feasibility of building task-independent vocabularies to reduce the need for task-dependent training data. Finally, we exam- ine syntactic part-of-speech distribution as well as phonological properties of new words.