Vocal tract length normalization (VTLN) is commonly applied utterance-wise with a warping function which makes the assumption of a linear dependence between the vocal tract length and the location of the formants. In this work we propose a data-driven method for enhancing the performance of systems that already use standard VTLN. The method is based on elastic registration to estimate optimal nonparametric transformations to further reduce inter-speaker variabilities. Results show that the proposed method can increase the performance of monophone systems such that it reaches that of a triphone system.
Index Terms: automatic speech recognition, vocal tract length normalization, elastic registration