ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

MelRe: Vision-Based Mel-Spectrogram Restoration

Kaixuan Luan, Xiaoda Yang, Shile Cai, Ruofan Hu, Minghui Fang, Wenrui Liu, Jialong Zuo, Jiaqi Duan, Yuhang Ma, Junyu Lu

With advancements in visual technology, an increasing number of visual techniques have recently been applied in other fields. Among them, mel spectrograms provide a bridge between audio features and visual models. Previous work has demonstrated that applying image processing methods to mel spectrograms is feasible. However, traditional image-based models operate at a relatively coarse level, focusing primarily on controlling texture and shape. In contrast, mel spectrograms are highly sensitive to detail, containing complex time-frequency information that requires more refined modeling. To address this, we propose MelRe, a visual model specifically designed for mel spectrograms, aimed at tackling complex fine-grained audio degradation issues from a visual perspective. MelRe addresses the need for fine-grained detail through pixel-level restoration methods and employs degradation alignment and noise simulation strategies to achieve high-precision restoration across varying levels of degradation, demonstrating exceptional restoration performance. Experimental results show that MelRe achieves a new state-of-the-art (SOTA) level in complex audio restoration tasks, highlighting its potential for high-quality audio repair.