ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

ASR Confidence Estimation using True Class Lexical Similarity Score

Nagarathna Ravi, Thishyan Raj T, Ravi Teja Chaganti, Vipul Arora

Deep Neural Networks (DNN) often exhibit overconfidence, leading to poor confidence calibration in Automatic Speech Recognition (ASR) models. State-Of-The-Art (SOTA) approaches to estimate confidence are based on statistical measures or auxiliary models trained in supervised way using binary target scores, which however, fail to capture the degree of errors in substituted outputs. Continuous target score uses temporal alignment between predictions and ground truth, but are prone to inaccurate temporal alignment. To address these limitations, we propose a novel target score, True Class Lexical Similarity (TruCLeS), to train the auxiliary Confidence Estimation Model (CEM). TruCLeS is based on true class probability and lexical similarity between the prediction and ground truth. Experiments with CTC and RNN-Transducer based ASR models support its superiority against SOTA approaches. The codes are available.