ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

SepVAC: Multitask Learning of Speaker Separation, Speaker Localization, Microphone Array Localization, and Room Acoustic Parameter Estimation in Various Acoustic Conditions

Roland Hartanto, Sakriani Sakti, Koichi Shinoda

This paper proposes a multitask learning method for speech separation, that Separates speech and estimates the recording conditions in Various Acoustic Conditions (SepVAC) jointly. Unlike the previous methods that aim to achieve robustness against the uncertainty caused by noise and reverberation, this method explicitly estimates speaker & microphone location and room acoustic parameters to disambiguate them from speech features. We introduce curriculum learning to learn the model parameters stably. In our evaluation using SMS-WSJ-Plus dataset, it outperforms the state-of-the-art SpatialNet baseline by 0.67 points in word error rate (WER).