ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Large vocabulary speech recognition with context dependent MMI-connectionist / HMM systems using the WSJ database

Jörg Rottland, Christoph Neukirchen, Daniel Willett, Gerhard Rigoll

In this paper we present a context dependent hybrid MMI-connectionist / Hidden Markov Model (HMM) speech recognition system for the Wall Street Journal (WSJ) database. The hybrid system is build with a neural network, which is used as a vector quantizer (VQ) and an HMM with discrete probablility density functions, which has the advantage of a faster decoding. The neural network is trained on an algorithm, that tries to maximize the mutual information between the classes of the input features (e.g. phones, triphones, etc.) and the neural firing sequence of the network. The system has been trained on the 1992 WSJ corpus (si-84). Tests were performed on the five- and twentythousand word, speaker independent (si_et) tasks. The error rates of a new context dependend neural network are 29% lower (relative) than the error rates of a standard (k-means) discrete system and the ratesare very close to the best continuous/semi-continuous HMM speech recognizers.