ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Simultaneous discriminative training and mixture splitting of HMMs for speech recognition

Muhammad Ali Tahir, Markus Nussbaum-Thom, Ralf Schlüter, Hermann Ney

A method is proposed to incorporate mixture density splitting into the acoustic model discriminative training for speech recognition. The standard method is to obtain a high resolution acoustic model by maximum likelihood training and density splitting, and then improving this model by discriminative training. We choose a log-linear form of acoustic model because for a single Gaussian density per triphone state the log-linear MMI optimization is a convex optimization problem, and by further splitting and discriminative training of this model we can get a higher complexity model. Previously it was shown that we achieve large gains in the objective function and corresponding moderate gains in the word error rate on a large vocabulary corpus. This paper incorporates the state of the art minimum phone error training criterion into the framework, and shows that after discriminative splitting, a subsequent log-linear MPE training achieves better results than Gaussian mixture model MPE optimization alone.

Index Terms: speech recognition, log linear modelling, discriminative training