ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

pMCT: Patched Multi-Condition Training for Robust Speech Recognition

Pablo Peso Parada, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay

We propose a novel Patched Multi-Condition Training (pMCT) method for robust Automatic Speech Recognition (ASR). pMCT employs Multi-condition Audio Modification and Patching (MAMP) via mixing patches of the same utterance extracted from clean and distorted speech. Training using patchmodified signals improves robustness of models in noisy reverberant scenarios. Our proposed pMCT is evaluated on the LibriSpeech dataset showing improvement over using vanilla Multi-Condition Training (MCT). For analyses on robust ASR, we employed pMCT on the VOiCES dataset which is a noisy reverberant dataset created using utterances from LibriSpeech. In the analyses, pMCT achieves 23.1% relative WER reduction compared to the MCT.