ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Acoustic front-end optimization for large vocabulary speech recognition

Lutz Welling, N. Haberland, Hermann Ney

In this paper we describe experiments with the acoustic front{end of our large vocabulary speech recognition system. In particular, two aspects are studied: 1) linear transforms for feature extraction and 2) the modelling of the emission probabilities. Experiments are reported on a 5000 - word task of the ARPA Wall Street Journal database. For the linear transforms our main results are: a) Filter{bank coefficients yield a word error rate of 9.3%. b) A cepstral decorrelation reduces the error rate from 9.3% to 8.0%. c) By applying a linear discriminant analysis (LDA) a further reduction in the error rate from 8.0% to 7.1% is obtained. d) Recognition results are similar for a LDA applied to filter{bank outputs and to cepstral coefficients. The experiments with density modelling gave the following results: a) Gaussian and Laplacian densities yield similar error rates. b) One single vector of variances or absolute deviations outperforms density-specific or mixture- specific vectors.