ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

SDNet: Stream-attention and Dual-feature Learning Network for Ad-hoc Array Speech Separation

Honglong Wang, Chengyun Deng, Yanjie Fu, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Fei Wang

Considerable progress has been made in multi-channel speech separation for fixed arrays. In this paper, we aim to develop a robust system for ad-hoc arrays to deal with uncertainties of microphone locations and numbers. Previous works commonly used the averaging method for ad-hoc arrays, overlooking the diversity of microphones in various positions. Some studies suggest that microphones with high signal-to-noise ratio(SNR) are more helpful in improving speech quality. Motivated by this, we propose stream-attention and dual-feature learning network called SDNet. The key points are as follows: 1) We propose a dual-feature learning block with fewer parameters to learn the long-term dependency better. 2) Based on this high-quality speech representation, we further propose stream attention that effectively handles microphone variability and allocates more attention to microphones with higher SNR. Experiments show that our proposed model outperforms other advanced baselines.