ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis

Kazuhiko Iwata, Tetsunori Kobayashi

In this study, we investigated speaker's intentions that the listeners perceive through subtly different sentence-final intonations. Approximately 2,000 sentence utterances were recorded and the fundamental frequency (F0) contours at the last vowel of those sentences were classified through one of the standard clustering algorithms. There found various F0 contours, namely, not only simple rising and falling intonations but also rise-fall and fall-rise intonations. In order to reveal the relationship between the intonation and the intentions, 10 representative contours were selected on the basis of the results of the clustering. Using the selected contours, a subjective evaluation was conducted. Six Japanese sentences that could have different meanings according to the sentence-final intonations were synthesized and the F0 contour at the last vowel of each sentence was replaced with the contours. The results of the evaluation by nine listeners showed that, for example, a certain falling intonation could express the intention of the econvictionf and another one that slightly differ in the shape could convey edoubt.f It was found that the subtle difference in the sentence-final F0 shape conveyed various nuances and connotations.

Index Terms: speech synthesis, sentence-final intona- tion, speaker's intention