ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A novel technique for voice conversion based on style and content decomposition with bilinear models

Victor Popa, Jani Nurminen, Moncef Gabbouj

This paper presents a novel technique for voice conversion by solving a two-factor task using bilinear models. The spectral content of the speech represented as line spectral frequencies is separated into so-called style and content parameterizations using a framework proposed in [1]. This formulation of the voice conversion problem in terms of style and content offers a flexible representation of factor interactions and facilitates the use of efficient training algorithms based on singular value decomposition and expectation maximization. Promising results in a comparison with the traditional Gaussian mixture model based method indicate increased robustness with small training sets.