ISCA Archive SSW 1990
ISCA Archive SSW 1990

Speech synthesis by optimum concatenation of phoneme segments

Tetsuya Nomura, Hideyuki Mizuno, Hirokazu Sato

To achieve a concatenation-type Japanese text-to-speech system, we propose two basic procedures. The first is the use of phoneme segments with multiple tri-phone labels as the fundamental synthesis units. The multiple tri-phone labels equivalently increases the variation of the synthesis units. The second is a segment concatenation procedure taking account of feature parameter continuity at the segment junctions. A distortion at segment junction is introduced, which indicates how well synthesis units are combined. Natural and distinct speech is produced by the proposed procedures.