Automatic Speech Recognition (ASR) for a robot should be robust for noises because a robot works in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve its robustness in such environments. This paper proposes AV integration for an ASR system for a robot which applies AV integration to Voice Activity Detection (VAD) and speech decoding. In VAD, we apply AV-integration based on a Bayesian network and in speech decoding, we apply AV-integration based on stream weights. We implemented a pro- totype AV-ASR system based on our proposed method and evaluated the system in several conditions. Preliminary results showed that the proposed system improves the robustness of ASR even in auditorily- or visually-contaminated situations.
Index Terms: audio-visual integration, speech recognition, voice activity detection