ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

A new model of excitation for text-to-speech synthesis

Yasushi Ishikawa, Tadashi Ebihara, Kunio Nakajima

This paper describes a new model of excitation for text-to-speech synthesis. A periodic pulse train is widely used as excitation of voiced speech in a vocoder. However, with this simplified model, it is difficult to synthesize high-quality speech. We propose a new model which represents residual signal as averaged features and these fluctuation. In this method, spectral feature and averaged pitch period is obtained from residual signal, and excitation signal is generated by these parameters with adding fluctuation component. Analysis of many sentence utterances shows an obvious quantitative relation between the spectral fluctuation and energy of speech. Thus in the model, fluctuation component is controlled by energy. An evaluation experiment is carried out. The results show that high quality synthetic speech is derived with the model. Furthermore, informal listening test results are presented which confirm that our method is effective in text-to-speech synthesis.

Keywords: excitation, vocoder, text-to-speech synthesis