ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

So-to-Speak: An Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS

Éva Székely, Siyang Wang, Joakim Gustafson

In recent years, numerous speech synthesis systems have been proposed that feature multi-dimensional controllability, generating a level of variability that surpasses traditional TTS systems by orders of magnitude. However, it remains challenging for developers to comprehend and demonstrate the potential of these advanced systems. We introduce So-to-Speak, a customisable interface tailored for showcasing the capabilities of different controllable TTS systems. The interface allows for the generation, synthesis, and playback of hundreds of samples simultaneously, displayed on an interactive grid, with variation both low level prosodic features and high level style controls. To offer insights into speech quality, automatic estimates of MOS scores are presented for each sample. So-to-Speak facilitates the audiovisual exploration of the interaction between various speech features, which can be useful in a range of applications in speech technology.