ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

ASR Error Correction with Augmented Transformer for Entity Retrieval

Haoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal, Yang Liu

Domain-agnostic Automatic Speech Recognition (ASR) systems suffer from the issue of mistranscribing domain-specific words, which leads to failures in downstream tasks. In this paper, we present a post-editing ASR error correction method using the Transformer model for entity mention correction and retrieval. Specifically, we propose a novel augmented variant of the Transformer model that encodes both the word and phoneme sequence of an entity, and attends to phoneme information in addition to word-level information during decoding to correct mistranscribed named entities. We evaluate our method on both the ASR error correction task and the downstream retrieval task. Our method achieves 48.08% entity error rate (EER) reduction in ASR error correction task and 26.74% mean reciprocal rank (MRR) improvement for the retrieval task. In addition, our augmented Transformer model significantly outperforms the vanilla Transformer model with 17.89% EER reduction and 1.98% MRR increase, demonstrating the effectiveness of incorporating phoneme information in the correction model.