ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator

Yeonjong Choi, Chao Xie, Tomoki Toda

Recent trends have emerged to implement voice conversion (VC) in real-world scenarios where background sounds and reverberation are inevitable. However, most VC studies mainly focus on clean speech conversion, where high-quality speech data are required for training and testing. Moreover, the background sounds and reverberation are treated as interferences to be discarded, despite being informative to be retained in some scenarios, such as movie dubbing and singing VC. In this paper, we propose a reverberation-robust VC framework consisting of a reverberation time (T60) estimation module and a VC module. The T60 estimator is introduced to provide the VC module with the reverberation information to model the reverberant speech. Experimental results show that 1) our framework can disentangle and control the speaker identity and reverberation from the speech, and 2) we can get acceptable VC performances dealing with reverberation, even when clean training data are not available.