ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Emotional Voice Conversion with Semi-Supervised Generative Modeling

Hai Zhu, Huayi Zhan, Hong Cheng, Ying Wu

Emotional Voice Conversion (EVC) is a task that aims to convert the emotional state of speech from one to another while preserving the linguistic information and identity of the speaker. However, many studies are limited by the requirement for parallel speech data between different emotional patterns, which is not widely available in real-life applications. Furthermore, the annotation of emotional data is highly time-consuming and labor-intensive. To address these problems, in this paper, we propose SGEVC, a novel semi-supervised generative model for emotional voice conversion. This paper demonstrates that using as little as 1% supervised data is sufficient to achieve EVC. Experimental results show that our proposed model achieves state-of-the-art (SOTA) performance and consistently outperforms EVC baseline frameworks.