ISCA Archive ICSLP 1990
ISCA Archive ICSLP 1990

Text-to-speech synthesis using a natural voice source

Stephen D. Pearson, Hector R. Javkin, Kenji Matsui, Takahiro Kamai

Our aim is to improve text-to-speech in its naturalness and its ability to model individual speakers. This paper describes various methods for using inverse-filtered waveforms from natural speech as a voice source in a text-to-speech system. One method uses a repeating loop, and controls pitch by interpolating samples in the waveform. Another method creates a source waveform of the desired pitch by concatenating single pulses from a collection of pulses. Listening tests were carried out to compare these methods with each other and with more traditional voice source generation techniques. The results indicate that these "natural glottal source" methods can substantially improve the quality of text-to-speech synthesis.