The over-smoothing problem in the middle- and high-frequency areas prevents the acoustic model from generating high-quality singing voices. In this paper, we propose XiaoiceSing2, which is a generative adversarial network consisting of a FastSpeech-based generator and a multi-band discriminator, to generate the full-band mel-spectrogram. Specifically, we improve the feed-forward Transformer (FFT) block by adding multiple residual convolutional blocks in parallel with the self-attention block to balance the local and global features. The multi-band discriminator contains three sub-discriminators responsible for low-, middle-, and high-frequency parts of the mel-spectrogram, respectively. Each sub-discriminator is composed of several segment discriminators (SD) and detail discriminators (DD) to distinguish the audio from different aspects. The experiment on our internal 48kHz singing voice dataset shows XiaoiceSing2 significantly improves the quality of the singing voice over XiaoiceSing