Audio super-resolution (ASR) aims to reconstruct the high-resolution
signal from its corresponding low-resolution one, which is hard while
the correlation between them is low.
In this paper, we
propose a learning model, QISTA-Net-Audio, to solve ASR in a paradigm
of linear inverse problem. QISTA-Net-Audio is composed of two components.
First, an audio waveform can be presented as a complex-valued spectrum,
which is composed of a real and an imaginary part, in the frequency
domain. We treat the real and imaginary parts as an image, and predict
a high-resolution spectrum but only keep the phase information from
the viewpoint of image reconstruction. Second, we predict the magnitude
information by solving the sparse signal reconstruction problem. By
combining the predicted magnitude and the phase together, we can recover
the high-resolution waveform. Comparison with the state-of-the-art
method MfNet [1], in terms of measure metrics SNR, PESQ, and STOI,
demonstrates the superior performance of our method.