ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain

Zixuan Li, Shulin He, Jinglin Bai, Xueliang Zhang

Neural networks that leverage both full-band and sub-band information have demonstrated exceptional performance across various speech processing tasks. In this paper, we examine the essential factors that allow these architectures to achieve state-of-the-art (SOTA) performance, identifying their 'inplace modeling' capability as a critical component of this success. Adhering to this principle, we introduce TF-SkiMNet, which employs Skipping Memory (SkiM) to efficiently perform global temporal modeling using full-band information with low computational overhead. For single-channel speech enhancement, TF-SkiMNet achieves comparable performance to the SOTA model TF-CrossNet while reducing MACs by 87%. Furthermore, TF-SkiMNet is evaluated on both single-channel and multi-channel speech enhancement tasks, achieving SOTA performance under similar computational budgets.