ISCA Archive IberSPEECH 2024
ISCA Archive IberSPEECH 2024

HiTZ-AhoLab ASR System for the Albayzin Bilingual Basque-Spanish Speech to Text Challenge

Asier Herranz, Adrián García-Sebastián, Christoforos Souganidis, Victor García-Romillo, Aitor Bellanco, Eva Navas, Inma Hernáez-Rioja, Ibon Saratxaga

This paper describes the HiTZ-AhoLab speech-to-text system development for the Albayzín Bilingual Basque-Spanish Speech to Text Challenge (BBS-S2TC), organized by the University of the Basque Country (UPV/EHU). All the systems were trained using the Nvidia NeMo framework's tools, based on the Conformer-Transducer Byte Pair Encoding architecture, using a total of 1622 hours of training data composed of Spanish, Basque and bilingual utterances (with code-switching). The proposed system achieved a 2.67% on the test and a 3.02% WER on the development subset of the Basque Parliament dataset, using mAES decoding with Language Model based re-scoring. It was submitted to the challenge as primary system, alongside the same system using the base greedy scoring method as contrastive system, which obtained a 2.74% and 3.15% WER on the same test and dev subsets.