ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Modeling durational variability in reading aloud a connected text

Caroline L. Smith

One of the most striking features of speech produced by humans is its enormous variability, especially in the prosody. If speech synthesizers are ever to sound more natural, they must reproduce some of this variability. A portion of the variability can be attributed to known linguistic factors, but a substantial amount remains of unknown origin. Statistical techniques can be used to imitate this variability in a probabilistic way, but it may also be possible to reduce the proportion that is attributed to unknown factors. This study investigates durational variability in ten readings of an extended passage of text by an American English speaker. The focus is on how the structure of topics in the spoken material can explain some variability in the acoustic durations, and on how variability from this and other sources might be modeled in synthesized speech.