ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

A study on using word-level HMMs to improve ASR performance over state-of-the-art phone-level acoustic modeling for LVCSR

I-Fan Chen, Chin-Hui Lee

In this paper, we propose word-level hidden Markov models (HMMs) to supplement state-of-the-art phone-based acoustic modeling in order to enhance the performance of automatic speech recognition (ASR) system. Each word in a vocabulary is initially modeled by well-trained triphone models. Maximum a posteriori adaptation is then applied to generate models for words with a large number of occurrences in the training set so that the acoustic distribution of the words can be modeled more precisely. Experimental results show that the proposed word-based systems outperform phone-based systems on the TIMIT task with a small training corpus. While in tasks with plenty of training data, word-based systems still show improvements over phone-based systems, such as the WSJ task. Furthermore the word-based systems have a better discriminating ability on short words and homophones. They are also more robust to language model weight variation than conventional phone-based systems.

Index Terms: word-level HMM, automatic speech recognition, detection-based ASR, language model weight, homophone