ISCA Archive ECST 1987
ISCA Archive ECST 1987

Voice conversion: a model for studying voice quality and speaker normalization

Donald G. Childers, Ke Wu, D. M. Hicks

This paper describes a number of speech analysis and synthesis factors that are important for synthesizing speech of high quality, i.e., that sounds natural. We have considered such factors as those related to 1) the synthesis model, 2) objective measures of quality including spectral replication, continuity, and tracking, 3) glottal excitation waveforms and parameters including source-tract interaction, jitter and shimmer, and 4) speech analysis, e.g., window shapes and sizes and the accurate identification of voiced/unvoiced/silent segments and fundamental frequency. We have tested three synthesizers (LPC, formant and articulatory) and present conclusions from both formal and informal listener evaluations for the LPC and formant synthesizers.