ISCA Archive SIGUL 2023
ISCA Archive SIGUL 2023

The Applicability of Wav2Vec2 and Whisper for Low-Resource Maltese ASR

Aiden Williams, Andrea Demarco, Claudia Borg

Maltese is a low-resource language with limited digital tools, including automatic speech recognition. With very limited datasets of Maltese speech available, a recent project, MASRI, developed further speech datasets and produced an initial prototype trained using the Jasper architecture. The best system achieved 55.05% WER on the MASRI test set. Our work builds upon this, producing a further two-and-a- half-hour annotated speech corpus from a domain in which no data was previously available (Parliament of Malta). Moreover, we experiment with existing pre-trained self-supervised models (Wav2Vec2.0 and Whisper) and further fine-tune these models on Maltese annotated data. A total of 30 Maltese ASR models are trained and evaluated using the WER and the CER. The results indicate that the performance of the models scales with the quantity of data, although not linearly. The best model achieves state-of-the-art results of 8.53% WER and 1.93% CER on a test set extracted from the CommonVoice project and 24.98% WER and 8.37% CER on the MASRI test set.


doi: 10.21437/SIGUL.2023-9

Cite as: Williams, A., Demarco, A., Borg, C. (2023) The Applicability of Wav2Vec2 and Whisper for Low-Resource Maltese ASR . Proc. 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023), 39-43, doi: 10.21437/SIGUL.2023-9

@inproceedings{williams23_sigul,
  author={Aiden Williams and Andrea Demarco and Claudia Borg},
  title={{The Applicability of Wav2Vec2 and Whisper for Low-Resource Maltese ASR }},
  year=2023,
  booktitle={Proc. 2nd Annual Meeting of the ELRA/ISCA SIG on Under-resourced Languages (SIGUL 2023)},
  pages={39--43},
  doi={10.21437/SIGUL.2023-9}
}