Silent Speech Interfaces transform biosignals into audible speech, providing a voice to those who cannot speak. This study focuses on generating speech from electromyographic (EMG) signals captured from facial muscles during silent mouthing. Using a Transformer Encoder-based model and a multi-speaker Spanish dataset with both audio and EMG, this work studies whether a mono-speaker model outperforms a multi-speaker model. Additionally, we evaluate the influence of a speaker identification vector on model performance. Evaluation criteria include spectral feature distortion, intelligibility and voice similarity. Results indicate that the mono-speaker model outperforms multi-speaker models for speakers with a large amount of data. For speakers with less data, the multi-speaker model produces more accurate spectrograms, while the mono-speaker model better preserves the identity of the speaker’s voice. The speaker embedding vector did not consistently improve model performance.