We introduce a framework for speech enhancement based on convolutive non-negative matrix factorization that leverages available speech data to enhance arbitrary noisy utterances with no a priori knowledge of the speakers or noise types present. Previous approaches have shown the utility of a sparse reconstruction of the speech-only components of an observed noisy utterance. We demonstrate that an underlying speech representation which, in addition to applying sparsity, also adapts to the noisy acoustics improves overall enhancement quality. The proposed system performs comparably to a traditional Wiener filtering approach, and the results suggest that the proposed framework is most useful in moderate- to low-SNR scenarios.
Index Terms: speech enhancement, convolutive non-negative matrix factorization, basis adaptation, sparsity