ISCA Archive ICSLP 1996
ISCA Archive ICSLP 1996

Quantitative analysis of the local speech rate and its application to speech synthesis

Sumio Ohno, Masamichi Fukumiya, Hiroya Fujisaki

On the basis of the short-time relative speech rate defined by the authors, this paper examines the optimum width of the smoothing window by perceptual experiments on the naturalness of re-synthesized speech. With the optimum window of 270 ms, relative speech rates are obtained both for ‘fast’ and ‘slow’ utterances of the same sentence, using an utterance produced at a ‘normal’ speech rate. The averaged results show that the speech rate control function for an utterance can be approximately decomposed into a global component for each sentence and local components for each bunsetsu and each major syntactic boundary. Based on these results, a scheme is presented for controlling the local speech rate of a reference utterance to obtain a synthetic utterance of an arbitrary global speech rate.