SPHINX-II is designed for large vocabulary, speaker-independent continuous speech recognition and is based on semi-continuous hidden Markov models. In the November 1992 ARPA speech evaluation, SPHINX-II achieved the lowest error rate (5%). This paper concentrates on the special techniques that made SPHINX-II successful and different from other systems. Specifically these include senonic decision trees for acoustic modeling, the multi-pass decoder to meet the challenge for very large vocabulary recognition, and the unified stochastic engine for jointly optimizing the acoustic and language model.
Keywords: Shared-distribution models, senones, decision trees, multi-pass decoder, unified stochastic engine