ISCA Archive AVSP 2010
ISCA Archive AVSP 2010

Decision fusion by boosting method for multi-modal voice activity detection

Shin'ichi Takeuchi, Takashi Hashiba, Satoshi Tamura, Satoru Hayamizu

In this paper, we propose a multi-modal voice activity detection system (VAD) that uses audio and visual information. In multi-modal (speech) signal processing, there are two methods for fusing the audio and the visual information: concatenating the audio and visual features, and employing audioonly and visual-only classi&# 2;ers, then fusing the unimodal decisions. We investigate the effectiveness of decision fusion given by the results from AdaBoost. AdaBoost is one of the machine learning method. By using AdaBoost, the effective classi&# 2;er is constructed by combining weak classi&# 2;ers. It classi&# 2;es input data into two classes based on the weighted results from weak classi&# 2;ers. In proposed method, this fusion scheme is applied to decision fusion of multi-modal VAD. Experimental results show proposed method to generally be more effective.

Index Terms: voice activity detection, VAD, multi-modal