Most current automatic speech recognition (ASR) models use decoders that do not have access to global contextual information at the token level. Therefore, we propose a decoder structure with text-level global contextual information. We construct the global information encoder based on non-autoregressive recognition. To eliminate the non-autoregressive independence assumption, we add a self-attention layer with rotary position encoding. The obtained text-level global contextual information and the decoder are fused as cross-attention to construct a decoder with contextual information. Our model can achieve a character error rate of 3.92% on the AISHELL-1 validation set and 4.35% on the test set, reducing the error rate by 1.72%(dev)/2.13%(test) compared to the baseline model, achieving SOTA performance. Finally, we also use visualization techniques to explain the role of global information in the decoder.