ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Recurrent neural network-enhanced HMM speech recognition systems

J. W. F. Thirion, Elizabeth C. Botha

In this paper, we show how speech recognition systems can be improved, using an adaptive model transition penalty term in the Viterbi decoding process. This term is calculated using the phonemic segmentation of the speech signal, where a bi-directional recurrent neural network is used to segment the speech into phonemes. No higher level lexical knowledge (phoneme sequence) is used in the segmentation process. The method is compared to an existing technique, on the state-of-the-art speech recognition system, HTK. It is shown that our technique results in significantly better phoneme recognition accuracy.