ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training

Yui Sudo, Shakeel Muhammad, Yifan Peng, Shinji Watanabe

End-to-end automatic speech recognition (ASR) has become an increasingly popular area of research, with two main models being online and offline ASR. Online models aim to provide real-time transcription with minimal latency, whereas offline models wait until the end of the speech utterance before generating a transcription. In this work, we explore three techniques to maximize the performance of each model by 1) proposing a joint parallel online and offline architecture for transducers; 2) introducing dynamic block (DB) training, which allows flexible block size selection and improves the robustness for the offline mode; and, 3) proposing a novel time-synchronous one-pass beam search using the online and offline decoders to further improve the performance of the offline mode. Experimental results show that the proposed method consistently improves the character/word error rates on the CSJ and LibriSpeech datasets.