In this paper, we present an integrated approach for recognizing both the word sequence and the syntactic-prosodic structure of a spontaneous utterance. The approach aims at improving the performance of the understanding component of speech understanding systems by exploiting not only acoustic and syntactic information, but also prosodic information directly within the speech recognition process. Whereas spoken utterances are commonly modelled as unstructured word sequences in the speech recognizer, our approach includes phrase (or clause) boundary information in the language model, and provides HMMs to model the acoustic and prosodic characteristics of phrase boundaries and disfluencies. This methodology has two major advantages compared to pure word-based speech recognizers. First, additional syntactic information is determined by the speech recognizer which facilitates parsing and resolves syntactic and semantic ambiguities. Second, the integrated model yields significantly better word accuracies than the traditional word-based approach.