This paper describes two experiments conducted to identify the role of synchronization in the perception of speech and gesture communication and to isolate the parameters that determine the perception of temporal alignment. The results of the first experiment show that the synchronization between audio and visual signals determines the felicitousness of a multimodal utterance. With the second experiment we were able to determine that prosodic alignment is a parameter that our subjects used to judge the well-formedness of speech and gesture input.