Speech recognition in multi-channel environments requires target speaker
localization, multi-channel signal enhancement and robust speech recognition.
We here propose a system that addresses these problems: Localization
is performed with a recently introduced probabilistic localization
method that is based on support-vector machine learning of GCC-PHAT
weights and that estimates a spatial source probability map. The main
contribution of the present work is the introduction of a probabilistic
approach to (re-)estimation of location-specific steering vectors based
on weighting of observed inter-channel phase differences with the spatial
source probability map derived in the localization step. Subsequent
speech recognition is carried out with a DNN-HMM system using amplitude
modulation filter bank (AMFB) acoustic features which are robust to
spectral distortions introduced during spatial filtering.
The system has been
evaluated on the CHIME-3 multi-channel ASR dataset. Recognition was
carried out with and without probabilistic steering vector re-estimation
and with MVDR and delay-and-sum beamforming, respectively. Results
indicate that the system attains on real-world evaluation data a relative
improvement of 31.98% over the baseline and of 21.44% over a modified
baseline. We note that this improvement is achieved without exploiting
oracle knowledge about speech/non-speech intervals for noise covariance
estimation (which is, however, assumed for baseline processing).