ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource MNMF with Localization Prior

Mieszko Fraś, Marcin Witkowski, Konrad Kowalczyk

Overlapping speech and high room reverberation deteriorate the accuracy of automatic speech recognition (ASR). This paper proposes a method for jointly optimum source separation and dereverberation using delayed subsource multichannel nonnegative matrix factorization (MNMF). We formulate a subsource-based signal model that accounts for late room reverberation using time-delayed microphone signals from several past time frames. We then propose a maximum a posteriori (MaP) estimator based on MNMF with localization prior on the mixing matrix suitable for direct-path and reverberant signal components estimation. Finally, two algorithms are derived, namely the original and simplified delayed subsource MNMF, which are shown to outperform many state-of-the-art approaches. The results of experimental evaluations, performed using real and simulated data, indicate superior performance of the proposed processing in terms of the word error rate (WER) as well as signal-to-distortion ratio (SDR).