Detecting and modeling proper phrasing from an input text string is an important aspect when producing synthesis that sounds intelligible and natural. Knowledge of proper phrase structure influences, e.g., the placement and length of pauses, and the realization of phrase-final boundary contours, both of which can have an effect in a listener's percepts ranging from naturalness to semantic interpretation. In this work, we look at modeling the occurrence, and types, of phrase breaks from purely textual features, paying close attention to how the performance of the systems generalizes in- and out-of-domain for corpora of various types (such as broadcast news, spontaneous speech, and synthesis databases), and as a function of various subsets of syntactical and lexical features investigated.
Index Terms: Prosody Modeling, Prosodic Assignment, Speech Synthesis