We present a high level formalism for specifying verbal and nonverbal output from a multimodal dialogue system. The output speci- fication is XML-based and provides information about communicative functions of the output without detailing the realisation of these functions. The specification can be used to control an animated character that uses speech and gestures. We give examples from an implementation in a multimodal spoken dialogue system, and describe how facial gestures are implemented in a (3) D-animated talking agent within this system.