ISCA Archive SpeechProsody 2012
ISCA Archive SpeechProsody 2012

Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model

Keikichi Hirose, Hiroya Hashimoto, Jun Ikeshima, Nobuaki Minematsu

Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process model, is improved to allow plural phrase components in a breath group. Since the model can clearly relate its commands with linguistic (and para-/non- linguistic) information, the method further enables flexible controls of prosody through manipulating model commands. Prosodic focus is realized in HMM-based speech synthesis as a supplemental process; viewing the differences of command magnitudes/amplitudes between utterances without and with focus. Validity of the method was confirmed by listening experiments of synthetic speech.

Index Terms: fundamental frequency contour, generation process model, HMM-based speech synthesis, prosodic focus