ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

MSAF: A Multiple Self-Attention Field Method for Speech Enhancement

Minghang Chu, Jing Wang, Yaoyao Ma, Zhiwei Fan, Mengtao Yang, Chao Xu, Zhi Tao, Di Wu

Speech enhancement (SE) systems, based on generative adversarial networks (GANs), are limited in improving speech quality and intelligibility. In this study, we propose a novel multiple self-attention field method for speech enhancement (MSAF). The models with different positions of the self-attention layers focus on different features. The output of each model is assigned a different feature weight, which is obtained by training. Then, we fuse the models according to the feature weights to obtain a clean speech signal. For speech quality, the proposed method improves by 8.22%, 8.52%, 9.28%, and 9.40% in CBAK, CSIG, COVL, and PESQ on average compared with the baseline SASEGANs. The results show that the MSAF comprehensively improves the performance of the baseline SASEGAN and performs better than the mainstream GAN-based SE methods. Importantly, the proposed method can be extended to other GAN-based SE methods.