ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Real Time Detection of Soft Voice for Speech Enhancement

Hector A. Cordourier, Georg Stemmer, Sinem Aslan, Tobias Bocklet, Himanshu Bhalla

People in remote meetings in open spaces might choose to speak with a restrained voice due to concerns around privacy or disturbing others. Research shows that persons prefer to use soft voice (voice with lower amplitude and pitch, but with harmonic tones in its spectrum) over whispered voice (voice with the lowest amplitude, and no harmonics at all) to avoid being overheard during such calls. We present a lightweight classifier based in a simple feed-forward neural network, which uses normalized Log-Mel spectrum of voice captured by a headset as input, and can detect if the person is using soft voice. This allows to enhance soft voice with more precision and responsiveness than regular amplitude compensation ("auto-gain") systems. In this show and tell, we present a real-time demo of the voice classifier. Viewers will see our algorithm detect in real-time soft voice vs other voice types, in a regular PC, with voice captured with a headset.