ISCA Archive Eurospeech 1991
ISCA Archive Eurospeech 1991

A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer

Jean-Claude Junqua, Ben Reaves, Brian Mak

In this paper the performances of three recently developed endpoint algorithms are evaluated and compared to the Lamel and Rosenberg's algorithm [1] based on energy levels and timing, which is enhanced by automatic threshold setting. Their performances are reported when integrated with two commonly used speech recognizers (discrete density vector quantization-based hidden Markov model (VQ-based HMM) and dynamic time warping (DTW)) in various types of noisy conditions. Accuracy was judged by agreement with hand-labeled endpoints, and by recognition rates. Results show that 1) a new noise adaptive algorithm using rms energy, zero-crossing rate, and a set of heuristics gives generally the best results at high or medium signal-to-noise ratio (>15 dB). 2) The HMM recognizer when used with this algorithm performs as well as if the endpoints were hand-labeled for clean Lombard speech; for noisy Lombard speech, depending on the type of noise used and the SNR, there is a degradation from 1% to 43% in recognition accuracy compared to manual labeling. 3) At low SNR, the algorithm based on Lamel and Rosenberg's method [1] and enhanced by automatic threshold setting gives generally better performance than the other algorithms. Keywords: Endpoint detection, voice activation, adaptive algorithm, Lombard-noisy speech, robustness.