ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

A Three-Stage Beamforming with Harmonic Guidance for Multi-Channel Speech Enhancement

Nurali Alip, Tianrui Wang, Rui Cao, Meng Ge, Jingru Lin, Longbiao Wang, Jianwu Dang

With the rapid advancement of multi-channel speech enhancement (MCSE) research, nonlinear spatial filtering methods integrating spatial and spectral processing are prevalent. However, many existing approaches overlook the explicit extraction of speech spectral structure information, leading to insufficient learning of joint spatial-spectral information, limiting performance under low signal-to-noise ratio (SNR) conditions. To address this, we propose a three-stage multi-channel speech enhancement framework. The first stage has an acoustic structure extraction module capturing speech spectral patterns from noisy inputs, enabling spatial-spectral cue exploration and interaction. In the next two stages, the process combines full-band noise reduction with speech structure refinement by decoupling enhancement into coarse enhancement and spectral refinement. Experiments on LibriSpeech-based datasets demonstrate that the proposed method significantly outperforms the reference method.