ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Smooth contour estimation in data-driven pitch modelling

Kim E. A. Silverman, Jerime R. Bellegarda, Kevin A. Lenzo

Apple's next-generation text-to-speech system in MacOS X uses a superpositional pitch model, comprising a relatively smooth underlying F0 contour and a separate contribution from the influence of the phonetic segments. This paper focuses on the data-driven modelling of the underlying contour, based on electroglottographic signals obtained from a corpus of reiterant speech. F0 extraction from such signals leads to more accurate characteristic shapes, as objectively illustrated by a typically low mean absolute frequency deviation (between 2 and 3 Hz) between original and synthetic F0 contours. This in turn supports a better (both more complete and more realistic) model of F0 behavior. Experimental results illustrate the improved prosodic representation resulting from this F0 model.