ISCA Archive SpeechProsody 2014
ISCA Archive SpeechProsody 2014

Pause insertion prediction using evaluation model of perceptual pause insertion naturalness

Hiroko Muto, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno

This paper describes a pause insertion prediction technique for generating more natural synthesized speech for text-to-speech (TTS) synthesis systems. A novel point of the proposed technique is the use of an evaluation model of perceptual pause insertion naturalness in addition to a prediction model based on machine learning. The evaluation model represents the relationship between several features related to pause insertion and the perceptual pause insertion naturalness obtained in a subjective evaluation. First, using a prediction model based on machine learning, we obtain the N-best sequences that indicate whether or not a pause is present at each phrase boundary. We then estimate pause insertion naturalness scores for each N-best sequence using the evaluation model and select the sequence with the highest naturalness score. Objective and subjective evaluation results show that the proposed technique gives better results than a conventional technique.