In this paper we propose the existence of an audio-visual scene analysis (AVSA) module which is able to integrate primitive information from both auditory and visual modalities. This module forms correspondences between the auditory and visual streams based on primitive properties of either representation. Through these correspondences visual information is employed to aid the segregation of acoustic sources. This enhanced segregation may be partly responsible for the increased intelligibility of audio-visual speech. The paper presents the initial results of a planned series of audio-visual speech experiments designed to test this account. Specifically, the experiments reported here address the question of whether visible movement of the speech articulators may protect speech from the effects of masking by noise. It is shown that a reduction in temporal uncertainty due to visual information may reduce the detection threshold for CVs in noise. The same performance increase was not observed in a parallel experiment testing consonant identification.