ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Sub-vector clustering to improve memory and speed performance of acoustic likelihood computation

Mosur Ravishankar, R. Bisiani, E. Thayer

We describe a sub-vector clustering technique to reduce the memory size and computational cost of continuous density hidden Markov models (CHMMs). Acoustic models in modern large-vocabulary, continuous speech recognition systems are typically CHMMs. Systems with 100,000 Gaussian distributions of 40-60 dimensions are common, needing several tens of MB of memory. Computing HMM state likelihoods is several tens of times slower than real time. We show that by clustering and quantizing the Gaussian distributions a few dimensions at a time, both computation and memory costs can be reduced several fold without significant loss of recognition accuracy. On the 1994 Wall Street Journal 20K test set, this technique reduced the acoustic model size by a factor of 9-10, and HMM state output likelihood computation time by a factor of 4-5.