ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions

Joshua Jansen van Vüren, Thomas Niesler

We present three approaches to improve language modelling of under-resourced code-switched speech. First, we challenge the practice of fine-tuning large pre-trained language models on small datasets. Secondly, we investigate the advantages of sub-word encodings for our multilingual code-switched speech. Thirdly, we propose an architectural innovation to the RNN language model that is specifically designed for code-switched text. We show a clear reduction in absolute word error rate of 0.17% for the adapted LSTM language model compared to M-BERT when employed in n-best rescoring experiments. Further, the LSTM models afford a seven-fold reduction in total number of parameters and reduces runtime during rescoring 100-fold. Contrary to recent research trends, our LSTM models do not outperform the word-level models when using sub-word vocabularies. Finally, the new architectural mechanism applied to the LSTM improves language prediction for a span of several words following a code-switch.