ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Improving Generalization of End-to-End ASR through Diversity and Independence Regularization

Ye-Eun Ko, Mun-Hak Lee, Dong-Hyun Kim, Joon-Hyuk Chang

Automatic speech recognition (ASR) has been driven by representative end-to-end model architectures, including connectionist temporal classification (CTC), attention-based encoder-decoder (AED), and recurrent neural network transducer (RNN-T). However, these models are prone to overfitting during training, which degrades their generalization performance. In this paper, we propose a novel regularization technique applicable to various ASR models: diversity loss and independence loss. Diversity loss reduces the similarity between feature representations, encouraging the model to learn diverse patterns. Independence loss minimizes the covariance between feature vectors, ensuring that they contain independent information and reducing redundancy. We apply these techniques to CTC, AED, and RNN-T models and demonstrate that the proposed regularization method effectively improves the model generalization performance and robustness through extensive experiments.