ISCA Archive ICSLP 1990
ISCA Archive ICSLP 1990

A rule-based speech synthesizer using pitch controlled residual wave excitation method

Kazuhiko Iwata, Yukio Mitome, Jun Kametani, Minoru Akamatsu, Seimitsu Tomotake, Kazunori Ozawa, Takao Watanabe

A Japanese text-to-speech conversion system has been developed, which can generate highly intelligible and natural synthetic speech from an arbitrary text written in Kanji characters (Chinese ideographs) by concatenating CV (C: consonant, V: vowel) and VC speech units. The system consists of a text analysis system and a speech synthesizer, constructed on compact hardware for a personal computer. To generate high quality synthetic speech, a pitch controlled residual wave excitation method is proposed, which uses residual waves as excitation signals for a synthesis filter in all portions of each speech unit. To realize natural rhythms, a phoneme duration rule has been created, based on statistical analysis of a large speech database. Evaluation experiments for the synthesizer were carried out. Results for the 100 syllable articulation test show an 88.8% accuracy rate and results for the 1,000 phonetically balanced word intelligibility test show a 97.4% accuracy.