ISCA Archive SSW 2021
ISCA Archive SSW 2021

Are we truly modeling expressiveness? A study on expressive TTS in Brazilian Portuguese for real-life application styles

Lucas H. Ueda, Paula D. P. Costa, Flavio O. Simoes, Mário U. Neto

This paper presents a study of expressive speech synthesis applied to real-life application styles in Brazilian Portuguese. We explore the use of data with different recording conditions in state-of-the-art architectures in expressive TTS. Our results suggest that the variability of recording conditions of the same style, combined with a guided training of the latent representation space of the Reference Encoder, assists in the modeling of non-archetypal expressivities. Additionally, we propose an alternative to evaluating the model’s ability to generate expressive speech during preliminary results, based on a classifier using GeMAPS features.