One question central to understanding the perceptual basis of audiovisual speech integration concerns where in the process the audio and visual streams are combined. Over 20 years of research on this question have provided answers ranging from the integration occurring at the level of the informational input, to occurring only after segment matches are made independently for each modality. In this paper I will present a modality-neutral account of audiovisual speech integration. From this account, speech perception is inherently amodal, and the information for speech is considered kinematic primitives that can be instantiated in any modality. Modality is considered to be invisible to the speech function, and integration occurs as a function of the input information itself. Four classes of support for a modality-neutral account will be presented including a) the primacy/ubiquity of multimodal speech; b) evidence for very early integration from the behavioral and c) neuropsychological data; and d) evidence for informational similitude existent across modalities.