ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis

Won-Gook Choi, So-Jeong Kim, TaeHo Kim, Joon-Hyuk Chang

Recently, diffusion models have exhibited higher sample quality with guidance, such as classifier guidance and classifier-free guidance. However, these guidances have limitations: they require extra classifiers or joint training, and incur additional sampling cost. In this study, we propose prior-free guidance diffusion model and prior-free guided text-to-speech (PfGuided-TTS) that can generate a speech at a quality as high as other guidances without extra training resources and computational cost. PfGuided-TTS can generate higher human perceptual quality speech than the existing autoregressive (AR) and non-AR models, including diffusion-based TTS on LJSpeech. In addition, we provide a schematic describing why and how classifier- and prior-free guided scores produce high-fidelity samples.