ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Smooth soft mel-spectrographic masks based on blind sparse source separation

Marco Kühne, Roberto Togneri, Sven Nordholm

This paper investigates the use of DUET, a recently proposed blind source separation method, as front-end for missing data speech recognition. Based on the attenuation and delay estimation in stereo signals soft time-frequency masks are designed to extract a target speaker from a mixture containing multiple speech sources. A postprocessing step is introduced in order to remove isolated mask points that can cause insertion errors in the speech decoder. The results for connected digit experiments in a multi-speaker environment demonstrate that the proposed soft masks closely match the performance of the oracle mask designed with a priori knowledge of the source spectra.