Speech is a multimodal phenomenom at the perception and production ends, and that includes the suprasegmental level of speech. This paper focuses on the auditory-visual nature of lexical tones, a suprasegmental unit of speech that characterises tone languages. A multimodal corpus consisting of audio and Optotrak recordings of 33 markers in the face and head was recorded with 3 native speakers of Cantonese. The recorded trajectories of the Optotrak markers were parameterized as polynomial coefficients and used as input to Linear Discriminant Analysis models for classification between the 6 Cantonese lexical tones. Face and head motion were able to classify between lexical tones with above-chance accuracy for each speaker individually and for all speakers combined. Other analyses were carried out to determine which face regions and types of head motion had a stronger influence of the lexical tone classification accuracy, and the movement of the eyebrows and of the larynx stood out.