ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation

Yong Xu, Vinay Kothapally, Meng Yu, Shixiong Zhang, Dong Yu

Despite the recent success of all-neural beamforming approaches for speech separation, deploying them onto low-powered devices is difficult due to their demanding computational requirements. To address this issue, we present a lightweight on-device Mel-subband neural beamformer for in-car multi-zone speech separation and introduce several effective methods to boost its performance. First, we propose a global full-band spectral and spatial embedding to assist the separation for each Mel-subband. Second, an explicit distortionless constraint is incorporated to control the non-linear distortion. Finally, teacher-student learning and quantization-aware training (QAT) are utilized to improve and accelerate the inference. Experimental results show that our proposed methods could achieve a significant word error rate (WER) reduction on real-recorded data and 0.39 real-time factor (RTF) on the device.