ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Text-independent voice conversion using speaker model alignment method from non-parallel speech

Peng Song, Yun Jin, Wenming Zheng, Li Zhao

In this paper, we propose a novel voice conversion method called speaker model alignment (SMA), which does not require parallel training speech. Firstly, the source and target speaker models, described by Gaussian mixture model (GMM), are trained, respectively. Then, the transformation function of spectral features is learned by aligning the components of source and target speaker models iteratively. Additionally, the transformation function is further combined with GMM, enabling the multiple local mappings, and a local consistent GMM (LCGMM) is also considered for model training to improve the conversion accuracy. Finally, we carry out experiments to evaluate the performance of the proposed method. Objective and subjective experimental results demonstrate that compared with the well-known INCA approach, the proposed method achieves lower spectral distortions and higher correlations, and obtains a significant improvement in perceptual quality and similarity.


doi: 10.21437/Interspeech.2014-187

Cite as: Song, P., Jin, Y., Zheng, W., Zhao, L. (2014) Text-independent voice conversion using speaker model alignment method from non-parallel speech. Proc. Interspeech 2014, 2308-2312, doi: 10.21437/Interspeech.2014-187

@inproceedings{song14_interspeech,
  author={Peng Song and Yun Jin and Wenming Zheng and Li Zhao},
  title={{Text-independent voice conversion using speaker model alignment method from non-parallel speech}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2308--2312},
  doi={10.21437/Interspeech.2014-187},
  issn={2308-457X}
}