Deep Neural Networks (DNN) often exhibit overconfidence, leading to poor confidence calibration in Automatic Speech Recognition (ASR) models. State-Of-The-Art (SOTA) approaches to estimate confidence are based on statistical measures or auxiliary models trained in supervised way using binary target scores, which however, fail to capture the degree of errors in substituted outputs. Continuous target score uses temporal alignment between predictions and ground truth, but are prone to inaccurate temporal alignment. To address these limitations, we propose a novel target score, True Class Lexical Similarity (TruCLeS), to train the auxiliary Confidence Estimation Model (CEM). TruCLeS is based on true class probability and lexical similarity between the prediction and ground truth. Experiments with CTC and RNN-Transducer based ASR models support its superiority against SOTA approaches. The codes are available.