ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Transformation of spectral envelope for voice conversion based on radial basis function networks

Tomomi Watanabe, Takahiro Murakami, Munehiro Namba, Tetsuya Hoya, Yoshihisa Ishida

This paper presents a novel algorithm that modifies the speech uttered by a source speaker to sound as if produced by a target speaker. In particular, we address the issue of transformation of the vocal tract characteristics from one speaker to another. The approach is based on estimating spectral envelopes using radial basis function (RBF) networks, which is one of the well-known models of artificial neural networks. The simulation results show that the proposed method achieves nearly optimal spectral conversion performance. Moreover, average cepstrum distance to the target speech is reduced by 87%, and in the listening tests, around 84% of mean opinion score (MOS) is obtained.