We discuss in this paper the problems bound to modular architectures, which lie at the root of most current speech understanding systems. We see why the use of different linguistic models brings about communication problems inside such systems and therefore, we detail the possible steps that could lead to an integration of this different types of knowledge. Finally, we propose a structure for a new architecture, illustrating its feasability through the study of temporal information in man-machine dialogues.