In many speech separation methods, the contextual information contained in the feature sequence is mainly modeled by recurrent layer and/or self-attention mechanism. However, how to combine these two powerful approaches more effectively needs to be explored. In this paper, a recurrent attention with parallel branches is proposed to first fully exploit the contextual information contained in the time-frequency (T-F) features. Then, this information is further modeled by the recurrent modules in a conventional manner. Specifically, the proposed recurrent attention with parallel branches uses two attention modules stacked sequentially. Each attention module has two parallel branches of self-attention to model dependencies along two axes and one convolutional layer for feature fusion. Thus, the contextual information contained in the T-F features can be fully exploited and further modeled by the recurrent modules. Experimental results showed the effectiveness of our proposed method.