ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

An Alternative to MFCCs for ASR

Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky, Sanjeev Khudanpur

The Mel scale is the most commonly used frequency warping function to extract features for automatic speech recognition (ASR) and is known to be quite effective. However, it is not specifically designed for ASR acoustic models based on deep neural networks (DNN). In this study, we introduce a frequency warping function which is a modified version of Mel scale. This warping function is parameterized using 2 parameters and we use it to propose a new set of features called modified Mel-frequency cepstral coefficients (MFCC), which use cosine-shaped filters. The bandwidths are computed using a new function. By evaluating the proposed features on a variety of ASR data sets, we see consistent improvements over regular MFCCs and (log) Mel filter bank energies.