ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Lightweight Dynamic Sparse Transformer for Monaural Speech Enhancement

Zehua Zhang, Xuyi Zhuang, Yukun Qian, Mingjiang Wang

Speech enhancement can effectively suppress environmental noise and improve the intelligibility of speech signals, which is a key task in the front-end processing of speech signals. We propose a monaural speech enhancement model called the lightweight dynamic sparse Transformer (LDSTransformer). From the complementarity perspective, we propose a dual branch structure combining coarse and fine branches. The coarse branch and the fine branch estimate the magnitude spectrum and the complex spectrum, respectively. Both branches share an innovative lightweight dynamic sparse Transformer block (LDSTB), which can efficiently extract deep time-frequency features. Furthermore, we propose a novel deep feature aggregation block to aggregate the deep features extracted by the LDSTBs. On the 1st Deep Noise Suppression Challenge blind test set, in environments with reverberation, our proposed model achieves an average improvement of 2.05 WB-PESQ, 10.05% STOI, and 10.65 SI-SDR.