ISCA Archive CHiME 2024
ISCA Archive CHiME 2024

The NPU-TEA System for the CHiME-8 NOTSOFAR-1 Challenge

Kaixun Huang, Yue Li, Ziqian Wang, Hongji Wang, Wei Rao, Zhaokai Sun, Zhiyuan Tang, Shen Huang, Yannan Wang, Tao Yu, Lei Xie, Shi-dong Shang

This paper presents NPU-TEA’s system submitted to the CHiME-8 NOTSOFAR-1 Challenge. Our system follows the architecture outlined in the official baseline, which comprises continuous speech separation (CSS), automatic speech recognition (ASR), and speaker classification (SD). We enhanced the CSS module by integrating WavLM Large, utilized a language model Rescore to assist ASR decoding, and replaced the speaker embedding extraction model in the SD module with ResNet293. Without utilizing any additional core challenge datasets beyond NOTSOFAR, our best systems achieve tcpWERs of 28.58% and 21.42% on the single-channel and multi- channel dev-2 datasets of NOTSOFAR-1, respectively, which represents a relative reduction of 37.65% and 32.11% compared to the baseline systems.