ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

Ning Ma, Guy J. Brown, Tobias May

This paper presents a novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for binaural localisation of multiple speakers in reverberant conditions. DNNs are used to map binaural features, consisting of the complete cross-correlation function (CCF) and interaural level differences (ILDs), to the source azimuth. Our approach was evaluated using a localisation task in which sources were located in a full 360-degree azimuth range. As a result, front-back confusions often occurred due to the similarity of binaural features in the front and rear hemifields. To address this, a head movement strategy was incorporated in the DNN-based model to help reduce the front-back errors. Our experiments show that, compared to a system based on a Gaussian mixture model (GMM) classifier, the proposed DNN system substantially reduces localisation errors under challenging acoustic scenarios in which multiple speakers and room reverberation are present.