ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Learning to rank with BERT-based confidence models in ASR rescoring

Ting-Wei Wu, I-Fan Chen, Ankur Gandhe

We propose a learning-to-rank (LTR) approach to the ASR rescoring problem. The proposed LTR framework has the flexibility of embracing wide varieties of linguistic, semantic, and implicit user feedback signals in rescoring process. BERT-based confidence models (CM) taking account of both acoustic and text information are also proposed to provide features better representing hypothesis quality to the LTR models. We show the knowledge of the entire N-best list is crucial for the confidence and LTR models to achieve best rescoring results. Experimental results on de-identified Alexa data show the proposed LTR framework provides an additional 5.16% relative word error rate reduction (WERR) on top of a neural language model rescored ASR system. On LibriSpeech, a 9.38 % WERR and a 13.63 % WERR are observed on the test-clean and test-other sets, respectively.