ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Segment based variable frame rate speech analysis and recognition using a spectral variation function

Giovanni Flammia, Paul Dalsgaard, Ove Andersen, Borge Lindberg

This paper reports on applications of a Spectral Variation Function to speech analysis and recognition. A Spectral Variation Function is a correlation measure between successive windows of acoustic observation vectors, and the function is well suited for speech analysis in which acoustic events are associated with segments of variable duration, rather than with frames sampled at a fixed rate. The Spectral Variation Function is used in word recognition experiments using continuous densities hidden Markov models. To prepare the experiments first each utterance is segmented by a peak-detection algorithm applied to the Spectral Variation Function, and the number of states in each HMM word model is set to the average number of segments as found by the function in the training set for that word. Second, a few (3 or 4) vectors per segment are selected as observations for the HMMs.

The frame selection reduces the number of observations by approximately 50% with respect to a fixed incoming frame rate of 10 ms. As a consequence, the computational load of both training and recognition is greatly reduced, while the computational load for the frame selection algorithm is limited. Using the variable frame rate technique gives recognition accuracy results which are comparable to those found with standard HMMs with fixed 10 ms frame rate. The multi-speaker, isolated word task experiments (17 speakers, 20 words, 924 test utterances) give 94.5% accuracy whereas fixed frame rate HMMs give between 93.4% and 94.0% accuracy.