ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement

Ricard Marxer, Jon Barker

In recent years, speech enhancement by analysis-resynthesis has emerged as an alternative to conventional noise filtering approaches. Analysis-resynthesis replaces noisy speech with a signal that has been reconstructed from a clean speech model. It can deliver high-quality signals with no residual noise, but at the expense of losing information from the original signal that is not well-represented by the model. A recent compromise solution, called constrained resynthesis, solves this problem by only resynthesising spectro-temporal regions that are estimated to be masked by noise (conditioned on the evidence in the unmasked regions). In this paper we first extend the approach by: i) introducing multi-condition training and a deep discriminative model for the analysis stage; ii) introducing an improved resynthesis model that captures within-state cross-frequency dependencies. We then extend the previous stationary-noise evaluation by using real domestic audio noise from the CHiME-2 evaluation. We compare various mask estimation strategies while varying the degree of constraint by tuning the threshold for reliable speech detection. PESQ and log-spectral distance measures show that although mask estimation remains a challenge, it is only necessary to estimate a few reliable signal regions in order to achieve performance close to that achieved with an optimal oracle mask.