ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios

Mieszko Fras, Marcin Witkowski, Konrad Kowalczyk

The performance of automatic speech recognition (ASR) systems strongly deteriorates when the desired speech signal is contaminated with room reverberation and when the speech of interfering speakers overlaps. To achieve acceptable word error rates (WER) by distant ASR in multispeaker reverberant scenarios, source separation and dereverberation can be performed as front-end processing. An existing optimum filter suitable for this task is the recently proposed weighted power minimization distortionless response convolutional beamformer (WPD). In this paper, we introduce a novel speech enhancement front-end for improving the accuracy of back-end ASR in scenarios with multiple reverberant overlapping speakers. The convolutional weighted multichannel Wiener filter (CW-MWF) is optimum for the joint separation and dereverberation task, and it is derived from the convolutional weighted minimum mean square error (CW-MMSE) optimization criterion, presented recently by the current authors. The WER results of performed experiments indicate superior performance of the CW-MWF in real and simulated rooms, irrespective of the method used for filter parameter estimation and the DNN model used for back-end ASR.