ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Large-scale subjective evaluations of speech rate control methods for HMM-based speech synthesizers

Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda

Three speech rate control methods for HMM-based speech synthesis were compared by large-scale subjective evaluations. The methods are 1) synthesizing speech sounds based on HMMs trained from corpora at a target speech rate, 2) stretching or shrinking utterance durations proportionally in waveform generation, and 3) determining state durations based on ML criterion under a restriction of utterance duration. The results indicated that the proportional shrinking had significant advantages for fast rate, whereas HMMs trained from slow speech sounds had a slight advantage for slow rate. We also found an advantage of proportionally shrunk speech from a synthesizer trained from slow speech corpora.