A method for the robust segmentation of foreground speech in the presence of background degradation using zero frequency filtered signal (ZFFS) is proposed. The speech signal from the desired speaker collected over a mobile phone is termed as foreground speech and the acoustic background picked by the same sensor that includes both speech and non-speech sources is termed as background degradation. The zero frequency filtering (ZFF) of speech allows only information around the zero frequency to pass through. The features from the resulting ZFFS, namely, the normalized first order autocorrelation coefficient and the strength of excitation of ZFFS are observed to be different for foreground speech and background degradation. A method for foreground speech segmentation is developed using these two features. The evaluation using utterances containing isolated words of foreground speech and background degradation collected in a real environment shows a robust foreground speech segmentation.
Index Terms: Foreground speech, background degradation, ZFFS, segmentation