ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Gradual modeling of the Lombard effect by modifying speaker embeddings from a Text-To-Speech model

Thiago Henrique Gomes Lobato, Magnus Schäfer

This work proposes to modify clean speech into Lombard-like speech by modifying speaker embeddings used to condition a text-to-speech model. A feedforward network learns to map embeddings of plain speech to those of Lombard speech using paired data from the Audio-Visual Lombard Grid corpus. Signal level is then increased as per ITU-T P.1150, and a neural vocoder performs time stretching. We show that the resulting speech retains most of the speaker’s identity while incorporating relevant Lombard characteristics. Additionally, by properly interpolating embeddings, we propose an approach to gradually model Lombard speech as a function of the background noise level. Listening tests show about a 1.12-point Mean opinion score (MOS) increase in speech plausibility in a loud background context, with only a 0.5-point MOS decrease in speaker similarity compared to an ideal Lombard speech interpolation.