ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor

Disong Wang, Yuexian Zou

Deep neural network (DNN) based DOA estimation (DNN-DOAest) methods report superior performance but the degradation is observed under stronger additive noise and room reverberation conditions. Motivated by our previous work with an acoustic vector sensor (AVS) and the great success of DNN based speech denoising and dereverberation (DNN-SDD), a unified DNN framework for robust DOA estimation task is thoroughly investigated in this paper. First, a novel DOA cue termed as sub-band inter-sensor data ratio (Sb-ISDR) is proposed to efficiently represent DOA information for training a DNN-DOAest model. Second, a speech-aware DNN-SDD is presented, where coherence vectors denoting the probability of time-frequency points dominated by speech signals are used as additional input to facilitate the training to predict complex ideal ratio masks. Last, by stacking the DNN-DOAest on the DNN-SDD with a joint part, the unified network is jointly fine-tuned, which enables DNN-SDD to serve as a pre-processing front-end to adaptively generate ‘clean’ speech features that are easier to be correctly classified by the following DNN-DOAest for robust DOA estimation. Experimental results on simulated and recorded data confirm the effectiveness and superiority of our proposed methods under different noise and reverberations compared with baseline methods.