It is well known that human conversational speech is “sparse” in time domain, comprising of many “off” time segments. This suggests the utility of the “off” time nature for the task of speech enhancement. We propose an efficient dualmicrophone method based on regularized cross-channel cancellation to distinguish the overlapping and single speech segments in the multi-speaker conversational environment. Fortunately, the regularized cancellation results can be reused for speech enhancement along an interference-suppression chain. We present evaluations of the proposed overlapping speech detection and integrated speech enhancement approaches using an IEEE speech database and real room recordings under various acoustic environments, showing promising improvements for speech enhancement by exploring the off time nature.
Index Terms: Overlapping speech detection, cross-channel cancellation, speech enhancement, l1 optimization.