ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

On Training a Neural Residual Acoustic Echo Suppressor for Improved ASR

Sankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan

Acoustic Echo Cancellation (AEC) is critical for accurate recognition of speech directed at a smart device playing audio. Previous work has showed that neural AEC models can significantly improve Automatic Speech Recognition (ASR) accuracy. In this paper, we train a conformer-based waveform-domain neural model to perform residual acoustic echo suppression (RAES) on the output of a linear AEC. We focus specifically on improving ASR accuracy in realistic mismatched test conditions, when training on large-scale simulated training data, as needed for production voice-interaction systems. Our key finding is that instead of naively using the best evaluation-time linear AEC configuration during neural RAES model training, using a weaker linear AEC generalizes significantly better, with 17-30% lower word error rate (WER) on a realistic re-recorded test set. Overall, the neural RAES model yields 38-53% WER reduction over the linear AEC alone.