ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Prefix Search Decoding for RNN Transducers

Kiran Praveen, Advait Vinay Dhopeshwarkar, Abhishek Pandey, Balaji Radhakrishnan

Automatic Speech Recognition (ASR) has seen a surge in popularity for Recurrent Neural Network Transducers (RNN-T) in recent years and shows much promise. RNN-Ts were introduced as an extension of Connectionist Temporal Classification (CTC) models. While CTC models have prefix search as the widely used decoding strategy, it appears to have been overlooked in favour of other decoding strategies such as time-synchronous decoding (TSD) and alignment-synchronous decoding. In this work, we introduce prefix search decoding, looking at all prefixes in the decode lattice to score a candidate. We show that our technique aligns more closely to the training objective compared to the existing strategies. We compare our technique with the originally proposed TSD, using Librispeech and AMI-IHM datasets. We find that while prefix search is closer to the training objective, with larger datasets the performance improves significantly, while with lower size datasets the performance degrades.