ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Evaluation of a speech recognition / generation method based on HMM and straight

Toshio Irino, Yasuhiro Minami, Tomohiro Nakatani, Minoru Tsuzaki, H. Tagawa

We propose a method for integrating speech recognition and generation within a unified framework. The method consists of STRAIGHT, warped-frequency DCT, and an HMM engine. The warped-frequency DCT is used to derive a kind of mel-cepstral coefficient from the smoothed spectrum of STRAIGHT, which is known as a high-quality vocoder. This analysis/synthesis method has potential to improve the performance beyond a conventional method using the MFCC derived from the STFT. We evaluated the method by using speakerdependent speech recognition as well as by the perceptual evaluation of sounds generated by HMM text-to-speech. The recognition rate using the coefficients from the warped-DCT of the STRAIGHT spectrum was almost the same as that obtained using conventional MFCCs. The sound quality was sufficiently good for a fundamental system.