ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation

Pavan Kalyan, Preeti Rao, Preethi Jyothi, Pushpak Bhattacharyya

While the idea of task arithmetic has been shown to be useful to steer the behaviour of neural models for NLP and vision tasks, it has not yet been used for speech. Moreover the tasks studied have been restricted to text classification and generation, and image classification. We extend the idea of task vectors to emotional speech synthesis in this work. We build emotion vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning for a given emotion. These emotion vectors can be modified or combined through arithmetic operations such as negation and addition, with the hope of steering the behaviour of the resulting model accordingly in the generation of emotional speech. We also show that the emotion vector can achieve the desired transfer of emotion to a speaker not seen during training.