Human multimodal communicative behaviors form a tightly integrated whole. By matching up gestural features with a carefully time-tagged transcription of the speech, we can observe how gesture features and discourse unit transitions cohere. Space usage SU is a key gestural component. We summarize the theory of SU. In our experiments where subjects make action plans around a terrain map, such SU become key organizational loci around which the discourse may be built. Our vision-based approach extracts SU histograms from stereo video describing the locus of motion of a speakers dominant hand. An N ×N fuzzy correlation of these histograms yields a correlation space in which similar SU is clustered. By locating the cluster transitions we can locate topical shifts in the discourse. We show results by comparing the transitions extracted from a sentential coding with a psycholinguistic semantic coding. We do the same with a uniform distributed time units and demonstrate the ability to recover discourse transitions.