Learning-based methods have made impressive strides in speech separation, and the implicit filter-and-sum network (iFaSNet) stands out as a reliable multi-channel solution. Meanwhile, the TF-GridNet has achieved state-of-the-art performance on the WSJ0-2mix dataset, indicating the underlying capability of time-frequency (T-F) domain speech separation methods. This paper investigates the possibility of constructing a T-F domain filter-and-sum network that improves upon the iFaSNet. In addition to optimizing the separation module, we develop a narrow-band spatial feature as a cross-channel feature and a convolution module for context decoding. With these enhancements, we redesign each module under the iFaSNet architecture, which entirely operates in the T-F domain. Thus, the proposed method is referred to as the TF-FaSNet. Experimental results on fixed microphone array geometries show that the TF-FaSNet outperforms the standard iFaSNet under all conditions with similar model complexity.