ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection

Yanmeng Guo, Qian Qian, Yonghong Yan

Voice activity detection (VAD) in real-world noise is a very challenging task. In this paper, a two-step methodology is proposed to solve the problem. First, segments with non-stationary components, including speech and dynamic noise, are located using sub-band energy sequence analysis (SESA). Secondly, voice is detected within the selected segments employing the proposed method concerning its harmonic structure. Therefore, speech segments can be accurately detected by this rule-based framework. This algorithm is evaluated in several databases in terms of speech/non-speech discrimination and in terms of word accuracy rate when it is used as the front-end of automatic speech recognition (ASR) system. It provides a more reliable performance over the commonly used standard methods.