ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

VTLN in the MFCC domain: band-limited versus local interpolation

Ehsan Variani, Thomas Schaaf

We propose a new easy-to-implement method to compute a Linear Transform (LT) to perform Vocal Tract Length Normalization (VTLN) on truncated Mel Frequency Cepstral Coefficients (MFCCs) normally used in distributed speech recognition. The method is based on a Local Interpolation which is independent of the Mel filter design. Local Interpolation (LILT) VTLN is theoretically and experimentally compared to a global scheme based on band-limited interpolation (BLI-VTLN) and the conventional frequency warping scheme (FFT-VTLN). Investigating the interoperability of these methods shows that the performance of LILT-VTLN is on par with FFT-VTLN and BLI-VTLN. The statistical significance test also shows that there are no significant differences between FFT-VTLN, LILT-VTLN, and BLI-VTLN, even if the models and front ends do not match.