ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Strategies for developing a Conversational Speech Dataset for Text-To-Speech Synthesis

Adaeze O. Adigwe, Esther Klabbers

There have been many efforts to improve the quality of speech synthesis systems in conversational AI. Although state-of-the-art systems are capable of producing natural-sounding speech, the generated speech often lacks prosodic variation and is not always suited to the task. In this paper, we examine data collection methods for dialogue data to use as training data for our acoustic models. We collect speech using three different setups: (1) Random read-aloud sentences; (2) Performed dialogues; (3) Semi-Spontaneous dialogues. We analyze prosodic and textual properties of the data collected in these setups and make some recommendations to collect data for speech synthesis in conversational AI settings.