ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Personalized speech recognizer with keyword-based personalized lexicon and language model using word vector representations

Ching-Feng Yeh, Yuan-ming Liou, Hung-yi Lee, Lin-shan Lee

The popularity of mobile devices offers an ideal platform for personalized recognizers. With data collected from the user, the personalized recognizer with better matched acoustic and linguistic characteristics can offer not only better recognition accuracy but also less computational time. In this paper, we propose a scenario that a small data set (500 utterances with annotation) can be collected for each user and used to personalize the recognizer. Based on this scenario, we present an overall framework for accuracy improvement and computational time reduction. We train Gaussian Mixture Models (GMMs) based on the word vector representations [1][2] and develop word clusters and keyword extraction approaches for personalization of the lexicon and language model. Prototype recognition systems with CD-DNN-HMM [3][4][5] acoustic models adapted by fDLR [6][7][8][9] were implemented and tested for 10 target users. It was shown that the personalized lexicon may include much more user-specific words not obtained before, and significant performance improvement in terms of tradeoff relationships between recognition accuracy and real time factor was observed.