ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Acoustic front-end optimization for large vocabulary speech recognition

Lutz Welling, N. Haberland, Hermann Ney

In this paper we describe experiments with the acoustic front{end of our large vocabulary speech recognition system. In particular, two aspects are studied: 1) linear transforms for feature extraction and 2) the modelling of the emission probabilities. Experiments are reported on a 5000 - word task of the ARPA Wall Street Journal database. For the linear transforms our main results are: a) Filter{bank coefficients yield a word error rate of 9.3%. b) A cepstral decorrelation reduces the error rate from 9.3% to 8.0%. c) By applying a linear discriminant analysis (LDA) a further reduction in the error rate from 8.0% to 7.1% is obtained. d) Recognition results are similar for a LDA applied to filter{bank outputs and to cepstral coefficients. The experiments with density modelling gave the following results: a) Gaussian and Laplacian densities yield similar error rates. b) One single vector of variances or absolute deviations outperforms density-specific or mixture- specific vectors.


doi: 10.21437/Eurospeech.1997-555

Cite as: Welling, L., Haberland, N., Ney, H. (1997) Acoustic front-end optimization for large vocabulary speech recognition. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 2099-2102, doi: 10.21437/Eurospeech.1997-555

@inproceedings{welling97_eurospeech,
  author={Lutz Welling and N. Haberland and Hermann Ney},
  title={{Acoustic front-end optimization for large vocabulary speech recognition}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={2099--2102},
  doi={10.21437/Eurospeech.1997-555},
  issn={1018-4074}
}