ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Neural Speech Synthesis with Enriched Phrase Boundaries

Marie Kunešová, Jindřich Matoušek

Prosodic phrasing is one of the factors influencing the naturalness of synthesized speech. In this paper, we enrich the phonetic representation for neural speech synthesis with additional markers denoting the strength of phrase breaks between words. These markers are assigned to the training data automatically, using our previously introduced model for audio-based phrase boundary detection. We tested the approach with two different levels of resolution for the break indices-either ten distinct levels (P10) or only “ToBI-like” four levels (P4). Listening tests with two different speaker voices show a statistically significant preference among listeners for P10 or P4 over the baseline speech synthesis without these markers (P0), although which version is judged as better depends on the voice.