ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection

Xiao-Lei Zhang, DeLiang Wang

Voice activity detection (VAD) is an important frontend of many speech processing systems. In this paper, we describe a new VAD algorithm based on boosted deep neural networks (bDNNs). The proposed algorithm first generates multiple base predictions for a single frame from only one DNN and then aggregates the base predictions for a better prediction of the frame. Moreover, we employ a new acoustic feature, multi-resolution cochleagram (MRCG), that concatenates the cochleagram features at multiple spectrotemporal resolutions and shows superior speech separation results over many acoustic features. Experimental results show that bDNN-based VAD with the MRCG feature outperforms state-of-the-art VADs by a considerable margin.


doi: 10.21437/Interspeech.2014-367

Cite as: Zhang, X.-L., Wang, D. (2014) Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection. Proc. Interspeech 2014, 1534-1538, doi: 10.21437/Interspeech.2014-367

@inproceedings{zhang14f_interspeech,
  author={Xiao-Lei Zhang and DeLiang Wang},
  title={{Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1534--1538},
  doi={10.21437/Interspeech.2014-367},
  issn={2308-457X}
}