ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

A trainable speech enhancement technique based on mixture models for speech and noise

Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis

Our work introduces a trainable speech enhancement technique that can directly incorporate information about the long-term, time-frequency characteristics of speech signals prior to the enhancement process. We approximate noise spectral magnitude from available recordings from the operational environment as well as clean speech from a clean database with mixtures of Gaussian pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE) we derive a closed form solution for the spectral magnitude estimation task. We evaluate our technique with a focus on real, highly non-stationary noise types (e.g. passing-by aircraft noise) and demonstrate its efficiency at low SNRs.