ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Voice expression conversion with factorised HMM-TTS models

Javier Latorre, Vincent Wan, Kayoko Yanagisawa

This paper proposes a method to modify the expression or emotion in a sample of speech without altering the speaker's identity. The method exploits a statistical speech model that factorises the speaker identity from expressions using linear transforms. For this approach, the set of transforms that best fit the speaker and expression of the input speech sample are learned. They are then combined with the expression transforms of the desired expression taken from another speaker. Since the combined expression transform is factorised and contains information about expression only, it may be applied to the original speech sample to modify its expression to the desired one without altering the identity of the speaker. Notably, this method may be applied universally to any voice without the need for a parallel training corpus.


doi: 10.21437/Interspeech.2014-363

Cite as: Latorre, J., Wan, V., Yanagisawa, K. (2014) Voice expression conversion with factorised HMM-TTS models. Proc. Interspeech 2014, 1514-1518, doi: 10.21437/Interspeech.2014-363

@inproceedings{latorre14b_interspeech,
  author={Javier Latorre and Vincent Wan and Kayoko Yanagisawa},
  title={{Voice expression conversion with factorised HMM-TTS models}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1514--1518},
  doi={10.21437/Interspeech.2014-363},
  issn={2308-457X}
}