ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Semisupervised training of a fully bilingual ASR system for Basque and Spanish

Mikel Penagarikano, Amparo Varona, German Bordel, Luis J. Rodriguez-Fuentes

Automatic speech recognition (ASR) of speech signals with code-switching (an abrupt language change common in bilingual communities) typically requires spoken language recognition to get single-language segments. In this paper, we present a fully bilingual ASR system for Basque and Spanish which does not require such segmentation but naturally deals with both languages using a single set of acoustic units and a single (aggregated) language model. We also present the Basque Parliament Database (BPDB) used for the experiments in this work. A semisupervised method is applied, which starts by training baseline acoustic models on small acoustic datasets in Basque and Spanish. These models are then used to perform phone recognition on the BPDB training set, for which only approximate transcriptions are available. A similarity score derived from the alignment of the nominal and recognized phonetic sequences is used to rank a set of training segments. Acoustic models are updated with those BPDB training segments for which the similarity score exceeds a heuristically fixed threshold. Using the updated models, Word Error Rate (WER) reduced from 16.46 to 6.99 on the validation set, and from 15.06 to 5.16 on the test set, meaning 57.5% and 65.74% relative WER reductions over baseline models, respectively.