ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

The MALACH Corpus: Results with End-to-End Architectures and Pretraining

Michael Picheny, Qin Yang, Daiheng Zhang, Lining Zhang

The MALACH corpus contains approximately 375 hours of Holocaust survivor testimonies along with transcripts (for approximately half the data) and audio. It is an extremely difficult corpus for speech recognition, encompassing accented, emotional speech, in many cases from elderly survivors. Nevertheless, significant progress has been made on speech recognition on MALACH with WERs now typically hovering at a 20% level for hybrid speech recognition systems. The purpose of this paper is to examine if recent developments in end-to-end architectures and pretraining with self-supervision continue to drive down performance as they do on popular read corpora such as Librispeech. We also experiment with leveraging the large fraction of unlabeled corpus data by extracting pseudolabels produced from previously trained systems. It is found that the best system - a fine-tuned wav2vec2 system trained on labeled and pseudolabeled data - achieves a WER of 13.5%, a huge gain in performance.