This paper describes Telef´onica I+D’s participation in the IberSPEECH-RTVE 2022 Speech-to-Text Transcription Challenge. We built an acoustic end-to-end Automatic Speech Recognition (ASR) based on the large XLS-R architecture. We first trained it with already aligned data from CommonVoice. After we adapted it to the TV broadcasting domain with a self-supervised method. For that purpose, we used an iterative pseudo-forced alignment algorithm fed with frame-wise character posteriors produced by our ASR. This allowed us to recover up to 166 hours from RTVE2018 and RTVE2022 databases. We additionally explored using a transformer-based seq2seq translator system as a Language Model (LM) to correct the transcripts of the acoustic ASR. Our best system achieved 24.27% WER in the test split of RTVE2020.