ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

Machine learning for text selection with expressive unit-selection voices

Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew

We show that a ranking model produced by machine learning outperforms two baselines when applied to the task of selecting texts for use in creating a unit-selection synthesis voice with good domain coverage. The model learns to predict the estimated utility of an utterance based on features relating it to the utterances selected so far and a corpus of target utterances. Our analyses indicate that our discriminative approach continues to work well even though the presence of rich prosodic and non-prosodic features significantly expands the search space beyond what has previously been handled by greedy methods.