Singing melody extraction is an important task in music information retrieval. In this paper, we propose a multi-band time-frequency attention network (MTANet) for singing melody extraction from polyphonic music, which can generate the feature representation to characterize the fundamental frequency (F0) component. Moreover, a band partition scheme is proposed to fit the position distribution of the F0 component. Further, three hourglass sub-networks are used to capture various multi-band features. Then, a feature fusion module (FFM) is employed to fuse the multi-band features. Visualization analysis shows that the multi-band feature extraction branch can generate the feature representation for characterizing the F0 component effectively. Experimental results show that the MTANet outperforms the existing state-of-the-art methods, while keeping with fewer network parameters. Visualized results intuitively show that the MTANet can reduce the octave and melody detection errors.