ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

DiffMV-ETS: Diffusion-based Multi-Voice Electromyography-to-Speech Conversion using Speaker-Independent Speech Training Targets

Kevin Scheck, Tom Dombeck, Zhao Ren, Peter Wu, Michael Wand, Tanja Schultz

Electromyography (EMG) signals have been investigated for novel voice prostheses to enable speech communication with silent articulation. In this work, we propose DiffMV-ETS, a multi-voice, diffusion-based EMG-to-speech system that converts EMG signals to speech in selectable voices. We evaluate it for scenarios where no speech of the speaker wearing EMG sensors is used for training. For this purpose, we introduce EMG-VCTK, a dataset containing EMG and audio recordings of sentences from the Voice Conversion Tool Kit corpus. We compare EMG models trained with audio of the same speaker, of auxiliary speakers, and of text-to-speech systems. Experiments indicate that models retain their intelligibility and naturalness when trained with synthetic speech. DiffMV-ETS enhances the speech naturalness and similarity to unseen voices. To the best of our knowledge, this is the first work to train multi-voice EMG-to-speech systems with speaker-independent targets.