ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition

Weifeng Li, Hervé Bourlard

Log energy and its delta parameters, typically derived from full-band spectrum, are commonly used in automatic speech recognition (ASR) systems. In this paper, we address the problem of estimating log energy in the presence of background noise (usually resulting in a reduction in dynamic ranges of spectral energies). We theoretically show that the background noise affects the trajectories of the "conventional" log energy and its delta parameters, resulting in very poor estimation of the actual log energy and its delta parameters, which no longer describe the speech signal. We thus propose to estimate log energy from the sub-band spectrum, followed by a dynamic range stretching. Based on speech recognition experiments conducted on CENSREC-2 in-car database, the proposed log energy (and its corresponding delta parameters) is shown to perform very well, resulting in an average relative improvement of 27.2% compared with the baseline front-ends. Moreover, it is also shown that further improvement can be achieved by incorporating those new MFCCs obtained through non-linear spectral contrast stretching.


doi: 10.21437/Interspeech.2012-111

Cite as: Li, W., Bourlard, H. (2012) Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition. Proc. Interspeech 2012, 314-317, doi: 10.21437/Interspeech.2012-111

@inproceedings{li12b_interspeech,
  author={Weifeng Li and Hervé Bourlard},
  title={{Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={314--317},
  doi={10.21437/Interspeech.2012-111},
  issn={2958-1796}
}