ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

AR-NLU: A Framework for Enhancing Natural Language Understanding Model Robustness against ASR Errors

Emmy Phung, Harsh Deshpande, Ahmad Emami, Kanishk Singh

A major challenge with pipeline spoken language understanding systems is that errors in the upstream automatic speech recognition (ASR) engine adversely impact downstream natural language understanding (NLU) models. To address this challenge, we propose an ASR-Robust NLU (AR-NLU) framework that extends a pre-existing NLU model by training it simultaneously on two input streams: human generated or gold transcripts and noisy ASR transcripts. We apply contrastive learning to make the model learn the same representations and predictions for both gold and ASR inputs, thereby enhancing its robustness against ASR noises. To demonstrate the effectiveness of this framework, we present two AR-NLU models: a Robust Intent DEtection (RIDE) and ASR-Robust BI-encoder for NameD Entity Recognition (AR-BINDER). Experimental results show that our proposed AR-NLU framework is applicable to various NLU models and significantly outperforms the original models in both sequence and token classification tasks.