This paper studies the robustness of discriminatively trained acoustic models for large vocabulary continuous speech recognition. Popular discriminative criteria maximum mutual information (MMI), minimum phone error (MPE), and minimum phone frame error (MPFE), are used in the experiments, which include realistic mismatched conditions from Finnish Speecon corpus and English Wall Street Journal corpus. A simple regularization method for discriminative training is proposed and it is shown to improve the robustness of acoustic models gaining consistent improvements in noisy conditions.
Index Terms: speech recognition, discriminative training, robustness