ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Knowledge Distillation Method for Pruned RNN-T Models via Pruning Bounds Sharing and Losses Confusion

Xiaocan Zhang, Weiwei Jiang, Guibin Zheng, Chenhao Jing, Jiqing Han, Tieran Zheng

Although the advantages of large models have been widely proved in speech recognition, small models are still required in several applications due to the limited computational resources or training data. The recognition accuracy of small models has always been a challenging issue. This paper proposes a distillation method for the pruned RNN-T structure to enhance the generalization ability of small models by leveraging information from large models, where the small model shares the pruning bounds of the large model as well as the decoder and connector structures, and multi-loss fusion is used to distill. Utilizing the Chinese speech dataset Aishell-1, experimental results demonstrated that the small model distilled from pre-trained large model significantly outperforms the directly trained model of the same size by a notable relative reduction of 30.4% in Character Error Rate (CER), thereby validating the effectiveness of the proposed knowledge distillation method.