ISCA Archive IberSPEECH 2024
ISCA Archive IberSPEECH 2024

Comparative Analysis of Mono-speaker and Multi-speaker Models for EMG-to-Speech Conversion

Eder del Blanco, Inge Salomons, Víctor García, Eva Navas, Inma Hernáez

Silent Speech Interfaces transform biosignals into audible speech, providing a voice to those who cannot speak. This study focuses on generating speech from electromyographic (EMG) signals captured from facial muscles during silent mouthing. Using a Transformer Encoder-based model and a multi-speaker Spanish dataset with both audio and EMG, this work studies whether a mono-speaker model outperforms a multi-speaker model. Additionally, we evaluate the influence of a speaker identification vector on model performance. Evaluation criteria include spectral feature distortion, intelligibility and voice similarity. Results indicate that the mono-speaker model outperforms multi-speaker models for speakers with a large amount of data. For speakers with less data, the multi-speaker model produces more accurate spectrograms, while the mono-speaker model better preserves the identity of the speaker’s voice. The speaker embedding vector did not consistently improve model performance.