ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Modelling and decoding of crossword context dependent phones in the Philips large vocabulary continuous speech recognition system

Peter Beyerlein, Meinhard Ullrich, Patricia Wilcox

The performance of the Philips system for large vocabulary continuous speech recognition has been improved significantly by crossword N-phone modelling, enhanced clustering of HMM-states during training, consistent handling of untrained HMM-states during decoding and a new effcient crossword N-phone M-gram decoding strategy. We report word error rate reductions of up to 18% on various ARPA test sets as compared to our best within-word triphone system, based on Laplacian densities, Viterbi decoding and _lterbank-LDA features. The following two issues are addressed: a) Transformation of a tree-organized bigram beam- search decoder into an effcient tree- organized decoder capable of handling long-span acoustic contexts as well as long-span language model contexts. b) State-clustering and generalizing of unseen contexts for the case of Laplacian emission probability density functions.