ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement

Zhuangqi Chen, Pingjian Zhang

Recent studies in deep learning based real-time speech enhancement have proven the advantage of sub-band processing in parameter reduction. However, most sub-band based methods utilize the same model for all sub-bands, which limits the upper bound of performance, giving the fact that the spectral patterns in each sub-band are different. In this paper, we take into account this fact and propose a lightweight full-band and sub-band fusion network, where dual-branch based architecture is employed for modeling local and global spectral pattern simultaneously. A simple yet effective sub-band module, the weighted progressive convolutional module, is designed with a small number of parameters, which captures clean features progressively from local perspective. Each sub-band is handled by one module. A novel asymmetric convolutional recurrent network is also proposed to focus on full-band context and extract more robust global features, which is complementary to the sub-band module. We have conducted extensive experiments on both the VoiceBank+Demand and the DNS Challenge datasets, and the experimental results show that our proposed method has achieved superior performance to other state-of-the-art approaches with smaller model size and lower latency.