ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Automatic vocabulary adaptation based on semantic similarity and speech recognition confidence measure

Shoko Yamahata, Yoshikazu Yamaguchi, Atsunori Ogawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi

Out-Of-Vocabulary (OOV) word utterances are unavoidable in speech recognition since the vocabulary size of a recognition dictionary is limited. And therefore, automatic vocabulary adaptation, which selects unregistered (i.e. OOV) words from relevant documents and registers them to a dictionary with their proper probability values, is an important technique. To improve recognition accuracy, a vocabulary adaptation method is required to register only relevant words that will actually be spoken in target utterances and not to register words that will not be spoken (i.e. redundant word entries). In this paper, we propose a novel automatic vocabulary adaptation method that satisfies these requirements based on semantic and acoustic similarities. Acoustic similarity is represented in speech recognition confidence measure. Experiments show that, with our method, the word selection accuracy is improved twice and the recognition accuracy focused on newly registered words is improved 15.1% in F-measure, compared with conventional methods.

Index Terms: out-of-vocabulary, vocabulary adaptation, semantic similarity, confidence measure