ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Distance-aware DNNs for robust speech recognition

Yajie Miao, Florian Metze

Distant speech recognition (DSR) remains to be an open challenge, even for the state-of-the-art deep neural network (DNN) models. Previous work has attempted to improve DNNs under constantly distant speech. However, in real applications, the speaker-microphone distance (SMD) can be quite dynamic, varying even within a single utterance. This paper investigates how to alleviate the impact of dynamic SMD on DNN models. Our solution is to incorporate the frame-level SMD information into DNN training. Generation of the SMD information relies on a universal extractor that is learned on a meeting corpus. We study the utility of different architectures in instantiating the SMD extractor. On our target acoustic modeling task, two approaches are proposed to build distance-aware DNN models using the SMD information: simple concatenation and distance adaptive training (DAT). Our experiments show that in the simplest case, incorporating the SMD descriptors improves word error rates of DNNs by 5.6% relative. Further optimizing SMD extraction and integration results in more gains.