ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

An Improved Deliberation Network with Text Pre-training for Code-Switching Automatic Speech Recognition

Zhijie Shen, Wu Guo

This paper proposes an improved deliberation network (DN) for end-to-end code-switching (CS) automatic speech recognition (ASR). In a conventional DN, acoustic encoding and first-pass hypothesis encoding are utilized separately and are simply combined by summation, which cannot take full advantage of their potential complementarity. Hence, the proposed improved DN model exploits the relationship between the two encodings through a two-staged process. First, by integrating the two encodings into a unified semantic space through a shared encoder, and second, by capturing the relevant information from the acoustic encoding through an attention mechanism before the final decoding process. Moreover, the lack of paired training data restricts the generalization ability of the model in CS ASR. To address this problem, the developed DN is pre-trained based on a denoising sequence-to-sequence (seq2seq) objective using unpaired text data. Experiments on a Chinese-English CS dataset demonstrate the effectiveness of the proposed method. Compared with the conventional DN, a 13.5% relative error rate reduction is observed.