A pivotal question in Automatic Speech Recognition (ASR) is the robustness
of the trained models. In this study, we investigate the combination
of two methods commonly applied to increase the robustness of ASR systems.
On the one hand, inspired by auditory experiments and signal processing
considerations, multi-band band processing has been used for decades
to improve the noise robustness of speech recognition. On the other
hand, dropout is a commonly used regularization technique to prevent
overfitting by keeping the model from becoming over-reliant on a small
set of neurons. We hypothesize that the careful combination of the
two approaches would lead to increased robustness, by preventing the
resulting model from over-rely on any given band.
To verify our hypothesis,
we investigate various approaches for the combination of the two methods
using the Aurora-4 corpus. The results obtained corroborate our initial
assumption, and show that the proper combination of the two techniques
leads to increased robustness, and to significantly lower word error
rates (WERs). Furthermore, we find that the accuracy scores attained
here compare favourably to those reported recently on the clean training
scenario of the Aurora-4 corpus.