ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Combination of training criteria to improve continuous speech recognition

Laurence Devillers, Christian Dugast

This paper is concerned with the combination of different learning criteria so as to improve continuous speech recognition performance. The learning criterion of Maximum Likelihood Estimation (MLE) used with the Viterbi algorithm in Hidden Markov Models (HMMs), and the discriminant one of Mean Squared Error (MSE) generally used with the gradient descent algorithm in a neural network such as the Time Delay Neural Network (TDNN) lead to different learning strategies. Each of those criteria generates particular internal representations and produces different classification errors. Combining both models in the recognition phase then allows to eliminate several misclassifications. Experiments have been conducted with such a combined system TDNN/HMM involving MLE and MSE learning criteria. The database tested is part of the Darpa Ressource Management Speaker Dependent database. The neural device is a hierarchical structure of TDNNs which makes training feasible for a large database on Unix workstations. The hybrid system consists of a linearly combination of the TDNN output scores and the HMMs probabilities during the recognition phase in order to minimize the classification errors. Such a system leads to a performance improvement of 15% to 20% compared to the state-of-the-art HMM systems.

Keywords: hierarchical structure of TDNNs, combined system TDNN/HMM