ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Combined prosody and candidate unit selections for corpus-based text-to-speech systems

Francisco Campillo-Díaz, Eduardo R. Banga

Traditionally, corpus-based text-to-speech systems generate the speech signal as the result of a two-staged process. First, the target prosody is determined and, after that, a set of speech units that minimize a cost function is selected. Once the target prosody is selected, no alternative prosodic information is generally considered, even when appropriated speech units are not found. In this paper we propose an alternative technique that takes into account several possible intonation contours, selecting the one that minimizes the cost function. In this method, both the candidate pitch contours and the candidate speech units are obtained by means of a unit selection process.