ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Using LVQ to enhance semi-continuous hidden Markov models for phonemes

Mikko Kurimo

Experiments are made to enhance the discrimination ability of the SCHMMs by applying Learning Vector Quantization. The SCHMMs are used for the modeling of phonemes in a speaker-dependent speech recognition application to create the phonetic transcriptions of spoken utterances. The probability density functions for the cepstral feature vectors produced in each state of each model are modeled by mixtures of multivariate Gaussian density functions. The mean vectors of the Gaussian densities are chosen by clustering the feature vectors of the training samples by using the Self-Organizing Map (SOM). Then the Gaussians are modified to correspond better to the Bayesian decision surfaces between phonemes by tuning the mean vectors by the LVQ. The experiments indicate that by this careful placement of the mean vectors the recognition error rates for the SCHMMs decrease significantly. LVQ algorithms can also be successfully applied after Baum-Welch or Viterbi training to slightly modify the Gaussians using training samples which would other- wise be incorrectly recognized. This kind of error corrective tuning drops out some of the recognition errors also in the test data.

Keywords: HMMs, LVQ, SOM, semi-continuous