ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition

Rupayan Chakraborty, Ashish Panda, Meghna Pandharipande, Sonal Joshi, Sunil Kumar Kopparapu

Front-end processing is one of the ways to impart noise robustness to speech emotion recognition systems in mismatched scenarios. Here, we implement and compare different frontend robustness techniques for their efficacy in speech emotion recognition. First, we use a feature compensation technique based on the Vector Taylor Series (VTS) expansion of noisy Mel-Frequency Cepstral Coefficients (MFCCs). Next, we improve upon the feature compensation technique by using the VTS expansion with auditory masking formulation. We have also looked into the applicability of 10th-root compression in MFCC computation. Further, a Time Delay Neural Network based Denoising Autoencoder (TDNN-DAE) is implemented to estimate the clean MFCCs from the noisy MFCCs. These techniques have not been investigated yet for their suitability to robust speech emotion recognition task. The performance of these front-end techniques are compared with the Non-Negative Matrix Factorization (NMF) based front-end. Relying on extensive experiments done on two standard databases (EmoDB and IEMOCAP), contaminated with 5 types of noise, we show that these techniques provide significant performance gain in emotion recognition task. We also show that along with front-end compensation, applying feature selection to non-MFCC high-level descriptors results in better performance.