ISCA Archive ICSLP 1990
ISCA Archive ICSLP 1990

A new training method for multi-phone speech units for use in a hidden Markov model speech recognition system

Jade Goldstein, Akio Amano, Hideki Murayama, Mariko Izawa, Akira Ichikawa

This paper describes preliminary results for a new method of training multi-phone units for discrete hidden Markov model speech recognition systems. The context sensitive, potentially poorly trained multi-phone units are combined with smaller speech units by a weighting scheme favoring well-trained data. We tested this method for the Japanese language, using the multi-phone disyllable (VCV pattern in Japanese) unit and the tripartite disyllable unit. A tripartite disyllable is composed of smaller speech units, a single consonant phone (in the case of Japanese) surrounded by two vowel demiphones (context sensitive half phones). For speaker-dependent isolated-word recognition, and training on data from three recording sessions of the same continuous speech training set, we obtained an average recognition performance of 96.6% for the merged system. This is a 9.1% improvement in the recognition rate over the standard disyllable system, and 0.8% over the tripartite disyllable system.