ISCA Archive SpeechProsody 2006
ISCA Archive SpeechProsody 2006

An innovative F0 modeling approach for emphatic affirmative speech, applied to the greek language

Georgios P. Giannopoulos, Aimilios E. Chalamandaris

Prosody generation engine which is is responsible for the naturalness of the synthetic speech, remains one of the most important component of a Text-to-Speech synthesis system. In this paper we present an innovative algorithm for modelling the fundamental frequency F0 for the Greek language, for sentences containing emphatic segments. The main idea of our approach is the definition of a specific set of intonation word models, derived from a spoken corpus, the use of which is sufficient in modeling the pitch contour of arbitrary long sentences similarly structured. Our method is based on a prosodic unit selection approach. This is tested to ILSP’s TtS system for the Greek language Ekfonitis+ [1], which is customized to utter weather reports with virtually natural synthetic voice. The system was designed and trained on a spoken corpus of 120 naturally uttered sentences of weather forecasts, containing emphasis segments and has proved to be very efficient in coping with similarly structured sentences. In the first section of the paper we present a brief review of the existing literature on this field, in addition with analogous approaches for other languages. In the second section we present our method and the design procedure. The last two sections contain the preliminary results acquired from our experiments as well as conclusions and refer to future work that needs to be carried out.