ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Confidence Measures in Encoder-Decoder Models for Speech Recognition

Alejandro Woodward, Clara Bonnín, Issey Masuda, David Varas, Elisenda Bou-Balust, Juan Carlos Riveiro

Recent improvements in Automatic Speech Recognition (ASR) systems have enabled the growth of myriad applications such as voice assistants, intent detection, keyword extraction and sentiment analysis. These applications, which are now widely used in the industry, are very sensitive to the errors generated by ASR systems. This could be overcome by having a reliable confidence measurement associated to the predicted output. This work presents a novel method which uses internal neural features of a frozen ASR model to train an independent neural network to predict a softmax temperature value. This value is computed in each decoder time step and multiplied by the logits in order to redistribute the output probabilities. The resulting softmax values corresponding to predicted tokens constitute a more reliable confidence measure. Moreover, this work also studies the effect of teacher forcing on the training of the proposed temperature prediction module. The output confidence estimation shows an improvement of -25.78% in EER and +7.59% in AUC-ROC with respect to the unaltered softmax values of the predicted tokens, evaluated on a proprietary dataset consisting on News and Entertainment videos.

doi: 10.21437/Interspeech.2020-2215

Cite as: Woodward, A., Bonnín, C., Masuda, I., Varas, D., Bou-Balust, E., Riveiro, J.C. (2020) Confidence Measures in Encoder-Decoder Models for Speech Recognition. Proc. Interspeech 2020, 611-615, doi: 10.21437/Interspeech.2020-2215

  author={Alejandro Woodward and Clara Bonnín and Issey Masuda and David Varas and Elisenda Bou-Balust and Juan Carlos Riveiro},
  title={{Confidence Measures in Encoder-Decoder Models for Speech Recognition}},
  booktitle={Proc. Interspeech 2020},