ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Speaker recognizability evaluation of a voicefont-based text-to-speech system

Masaharu Sakamoto, Takashi Saito

We have developed a new text-to-speech system based on the Voice- Font technology. A VoiceFont is a voice dictionary for speech synthesis that holds the acoustic and prosodic characteristics extracted from the voice corpus of a speaker. The text-to-speech system using a VoiceFont is able to synthetically mimic the voice of the donor speaker. In this paper, we evaluated speaker recognizability of the synthetic speech, which means whether the synthetic speech can be recognized as the donor speaker’s voice. We conducted a subjective evaluation for five VoiceFonts and here report on the evaluation results. The results show that our text-to-speech system based on VoiceFonts can retain the acoustic and prosodic characteristics of the donor speaker and the synthetic speech can be recognized as the donor speaker’s voice. Furthermore, we report on how much the spectral characteristics, phoneme duration, and pitch frequency affect speaker recognizability.