This paper is based on experience gained in connection with the German Government's spoken language data collections PHONDAT (1) and Verbmobil-PHONDAT. It addresses some of the basic decisions that have been made to guarantee that the resulting speech signal database system can find broadest applications. PHONDAT was planned not only to serve speech technology with respect to reliable German training and assessment material for speech recognition devices and/or for text to speech systems, but also to enable the further development of phonetic knowledge of spoken German. Only after enough empirical data have become available in a proper format can the actually spoken form of a language be formally represented through a phonetic theory in a sufficiently complete manner. If we call such a phonetic theory of a given spoken language a complete phonetic theory, CPT, then the final aim of the PHONDAT projects consists in contributing to the development of a CPT of spoken German.
Keywords: phonetic segmentation, phonetic labelling, speech data bases. (1) PHONDAT is the name of a joint research project of the universities of Bonn, Braunschweig, Kiel and Munich