ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Distinct triphone acoustic modeling using deep neural networks

Dongpeng Chen, Brian Mak

To strike a balance between robust parameter estimation and detailed modeling, most automatic speech recognition systems are built using tied-state continuous density hidden Markov models (CDHMM). Consequently, states that are tied together in a tied-state are not distinguishable, introducing quantization errors inevitably. It has been shown that it is possible to model (almost) all distinct triphones effectively by using a basis approach; previously two methods were proposed: eigentriphone modeling and reference model weighting (RMW) in CDHMM using Gaussian-mixture states. In this paper, we investigate distinct triphone modeling under the state-of-the-art deep neural network (DNN) framework. Due to the large number of DNN model parameters, regularization is necessary. Multi-task learning (MTL) is first used to train distinct triphone states together with carefully chosen related tasks which serve as a regularizer. The RMW approach is then applied to linearly combine the neural network weight vectors of member triphones of each tied-state before the output softmax activation for each distinct triphone state. The method successfully improves phoneme recognition in TIMIT and word recognition in the Wall Street Journal task.