ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Word-level Confidence Estimation for CTC Models

Burin Naowarat, Thananchai Kongthaworn, Ekapol Chuangsuwanich

Measuring confidence in Automatic Speech Recognition (ASR) is important for ensuring the reliability of downstream applications. Previous works proposed Confidence Estimation Module (CEM) for predicting confidences for autoregressive attention-based and neural transducer architectures. However, CEM for connectionist temporal classification (CTC) models have not been explored. In this work, we expand the idea of CEM to CTC models and further propose considering surrounding words for estimating confidences. Our experiments on four test sets in two languages demonstrate that our proposed method significantly reduces calibration errors of both common and rare words compared to naive confidences from CTC softmax. Moreover, we show that the approach is also effective for hard words and out-of-domain test sets, indicating its potential to be used as a reliable trigger for human intervention.