ISCA Archive CHiME 2023
ISCA Archive CHiME 2023

The SGU Systems for the CHiME-7 UDASE Challenge

Jaehoo Jang, Myoung-Wan Koo

In this work, we present a description of SGU domain-adapted speech enhancement system implementation that enhances the baseline of the CHiME-7 challenge. We introduce two significant modifications. Firstly, we replace the Sudo rm-rf [1] architecture with the Mossformer [2], which incorporates convolution-augmented joint local and global self-attention mechanisms. It performs fully-computed self-attention on local chunks and utilizes linearized low-cost self-attention over the entire sequence. As a second modification, we incorporate a speech purification technique at the baseline when conducting self-supervised learning for the student model. This technique predicts the frame-level SNR of the pseudo-target speech and utilizes them as weights for the discrepancy function between the pseudo-target speech and the student model’s estimated speech. Consequently, We achieved an SI-SDR score of 12.42 on the LibriCHiME-5 dataset for both modifications. Additionally, implementing the Mossformer architecture on the CHiME-5 dataset leads to a 2.90 OVRL-MOS and 3.39 SIG-MOS. Also, the application of the purification method results in a 3.71 BAK-MOS. Finally, we demonstrate the superior performance of our approach compared to the baseline.