This paper describes GESOM, a model for generation of generalised, high-level multi-modal dialogue system output. Its aim is to let dialogue systems generate output for various output devices and modalities with a minimum of changes to the output generation of the dialogue system. The model was developed and tested within the AdApt spoken dialogue system, from which the bulk of the examples in this paper are taken.