ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

C²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding

Xuxin Cheng, Ziyu Yao, Zhihong Zhu, Yaowei Li, Hongxiang Li, Yuexian Zou

Spoken language understanding (SLU) is a critical task in task-oriented dialogue systems. However, automatic speech recognition (ASR) errors often impair the understanding performance. Despite many previous models have obtained promising results for improving ASR robustness in SLU, most of them treat clean manual transcripts and ASR transcripts equally during the fine-tuning stage. To tackle this issue, in this paper, we propose a novel method termed C²A-SLU. Specifically speaking, we add calculated cross attention to the original hidden states and apply contrastive attention to compare the input transcript with clean manual transcripts to distill the contrastive information, which can better capture distinctive features of ASR transcripts. Experiments on three datasets show that C²A-SLU surpasses existing models and achieves a new state-of-the-art performance, with a relative improvement of 3.4% in terms of accuracy over the previous best model on SLURP dataset.