ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems

Cédric Boidin, Verena Rieser, Lonneke van der Plas, Oliver Lemon, Jonathan Chevelu

This paper presents a method for adaptively re-ranking paraphrases in a Spoken Dialogue System (SDS) according to their predicted Text To Speech (TTS) quality. We collect data under 4 different conditions and extract a rich feature set of 55 TTS runtime features. We build predictive models of user ratings using linear regression with latent variables. We then show that these models transfer to a more specific target domain on a separate test set. All our models significantly outperform a random baseline. Our best performing model reaches the same performance as reported by previous work, but it requires 75% less annotated training data. The TTS re-ranking model is part of an end-to-end statistical architecture for Spoken Dialogue Systems developed by the ECFP7 CLASSiC project.