ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Marc Evrard, Samuel Delalez, Christophe d'Alessandro, Albert Rilliard

Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French based on HMM was designed. A neutral corpus and six expressive speech corpora were used ( anger, fear, joy, sadness, sensuality, surprise). Five sentences were synthesized with the six types of expressivity through CMLLR adaptation. Using a chironomic system, three trained subjects were asked to modify synthetic sentences, aiming at improving their expressive quality. Natural, HMM-TTS, and HMM-TTS-Chironomic sentences were evaluated in an expressivity recognition test and a MOS test. The results show that chironomic modification brings significant improvements in both recognition and MOS tests. These results are discussed in detail, together with the effects of voice quality on the perception of HMM-TTS expressive speech. The two main conclusions are: (i) intonation of HMM-TTS can be significantly improved; (ii) hand-corrected TTS improves expressivity and overall quality. Chironomic stylization is a powerful tool lying between fully automatic TTS and recorded speech.