This paper introduces a novel time-frequency masking approach for speech
enhancement, based on the consistency of the phase of the cross-spectrum
observed at multiple microphones. The proposed approach is derived
from solutions commonly adopted in spatial source separation and can
be used as a post-filter in traditional multi-channel speech enhancement
schemes. Since it is not based on a modeling of the coherence of diffuse
noise, the proposed method complements traditional post-filters implementations,
targeting non diffuse/coherent sources. It is particularly effective
in domestic scenarios where microphones in a given room capture interfering
coherent sources active in adjacent rooms.
An experimental analysis
on the DIRHA-GRID corpus shows that the proposed method considerably
improves the signal-to-interference-ratio and can be used on top of
state-of-the-art multi-channel speech enhancement methods.