ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition

Eesung Kim, Aditya Jajodia, Cindy Tseng, Divya Neelagiri, Taeyeon Ki, Vijendra Raj Apsingekar

In production scenarios that require frequent change, it is inefficient to repeatedly train and update the entire End-to-end (E2E) model for spoken language understanding (SLU). In this paper, we present a study on efficiently adapting E2E SLU models based on pre-trained ASR model. Specifically, we propose the ASR-based E2E SLU model integrating an additional decoder for SLU and a fusion module that incorporates acoustic representation from the shared encoder and text transcript representation from ASR decoder. Furthermore, we investigate the effectiveness of an adapter module that fine-tunes only a small number of parameters for semantic and tran- script predictions. The experimental results show that the proposed model outperforms other competitive baselines in intent accuracy, SLU F1 score and word error rate (WER) on FSC, SLURP, and Samsung in-house SLU datasets.