Today's multimodal systems, which allow full-body (3D) gestures and speech as input modalities, are quite restricted to easily interpretable coverbal gestures with a predefined shape and meaning. In this paper, we propose methods to abstract the concrete shape of gestures by using high-level features and to integrate them with coexpressive words using their phonological attributes. The application of this approach is discussed for a class of gestures useful in virtual design. We sketch our technical environment and first implementation approaches to build a prototype system.