ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

QGAN: Low Footprint Quaternion Neural Vocoder for Speech Synthesis

Aryan Chaudhary, Vinayak Abrol

Neural vocoders have recently evolved to achieve superior synthesis quality by leveraging advancements in methods like diffusion, flow, transformers, GANs, etc. However, such models have grown vastly in terms of space and time complexity, leading to challenges in the deployment of speech synthesis systems in resource-constraint scenarios. To address this, we present a novel low-footprint Quaternion Generative Adversarial Network (QGAN) for efficient and high-fidelity speech synthesis without compromising on the audio quality. QGAN achieves structural model compression over conventional GAN with quaternion convolutions in the generator and a modified multi-scale/period discriminator. To ensure model stability, we also propose weight-normalization in the quaternion domain. We show the effectiveness of QGAN with large-scale experiments on English and Hindi language datasets. In addition, using loss landscape visualization, we provide an analysis of the learning behaviour of the proposed QGAN model.