To make synthesized speech more natural and colloquial the regularity of synthesized speech has to be overcome and spontaneous speech effects have to be integrated into the synthesis process. In a first step towards spontaneous speech we introduced different duration control methods in speech synthesis.
In this paper we summarize the results of previous works of changing the speaking rate indirectly by controlling the grapheme-to-phoneme conversion through different pronunciation variant selection algorithms. The presented results of listening experiments show a significant improvement in the category colloquial impression.
To evaluate the quality of the most outstanding variant selection approach compared to the canonical synthesis (as the state-of-the-art system), we performed a new listening test on longer speech samples. The variant synthesis applying a pronunciation variant sequence model achieved a significant lower listening effort and a higher overall rate (MOS) compared to the canonical synthesis.