The automatic prosodic annotation of large speech corpora gains increasing consideration since appropriate databases for the training of prosodic models in speech synthesis and recognition are needed. On linguistic level, correct phrase and accent marking are essential processing steps. The authors developed a neural network based method for signal-based phrase break prediction and tested this method across two different speech databases.
The structure of the multilayer feed-forward neural network (MFN) had been optimized and adapted to the target database and to the specific annotation task. The method is rather data sensitive - depending on different human labelers and small differences across training databases, like frequency of occurrence or strength of phrase breaks. The MFN method can be easily adapted to the characteristics of different databases (long or short phrases, special formats like dates or web addresses, etc.). If applied to different databases which contain phrase markers of human experts, phrase break recognition rates vary from 79% up to 97 %.