ISCA Archive ECST 1987
ISCA Archive ECST 1987

Speech decoding using Markov model: search for a prior criterion of quality

S. Soudoplatoff

Among all possibilities for Markov models applied to speech recognition, a separation exists between systems that deal with continuous parameters, i.e. lying in R", such as spectrum coefficients, LPC, etc... , using parametric density functions, and systems that maps these values into a finite vocabulary, whose size is very small (a few hundreds elements), usually done by a procedure known as vector quantization, or labelling. This paper presents a set of experiments that were run both to compare various types of labelling, and to search for a prior criterion of the quality of a labelling, thus avoiding to go through a complete experiment (training of the parameters, and decoding) anytime the input parameters, or the labels are changed. The standard labelling, which is the reference here, consists in clustering a set of vectors to obtain prototypes, using K-mean type algorithms, then assigning each vector to the nearest class center, according to a Euclidian distance. Other labels are obtained, by changing one, or both elements, of the metric space (acoustic vectors, metric). The experiments were run on parts of a 200000 words, isolated syllables, speech dictation system for French, which is under research at the Paris Scientific Center. They consist of a phonetic decoding, using a sub-optimal strategy. Criteria are obtained either by considering the labels as output of a channel, or from contingency tables. For each set of labels, two different hypothesis were made, depending whether one considers that the output are labels, or strings of labels. Criteria related to information theory, such as mutual information, or related to data analysis, such as Phi square, or Jordan, are computed. Actual results show that lower error rates than one obtained with the reference labels can be achieved, and that the results are globally consistent with the criteria.