ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking

Wangyou Zhang, Ying Zhou, Yanmin Qian

In the scenario with noise and reverberation, the performance of current methods for direction of arrival (DOA) estimation usually degrades significantly. Inspired by the success of time-frequency masking in speech enhancement and speech separation, this paper proposes new methods to better utilize time-frequency masking in convolution neural network to improve the robustness of localization. First a mask estimation network is developed to assist DOA estimation by either appending or multiplying the estimated masks to the original input feature. Then we further propose a multi-task learning architecture to optimize the mask and DOA estimation networks jointly, and two modes are designed and compared. Experiments show that all the proposed methods have better robustness and generalization in noisy and reverberant conditions compared to the conventional methods, and the multi-task methods have the best performance among all approaches.