ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

EmbedAug: An Augmentation Scheme for End-to-End Automatic Speech Recognition

Ashish Panda, Sunil Kumar Kopparapu

Data augmentation plays a significant role in making automatic speech recognition (ASR) systems robust against unseen test data. Most of the existing data augmentation techniques are designed to work on the speech features. Augmentation of speech embeddings, within the neural network, e.g., the inputsto encoders, are relatively unexplored. We present a simple yet effective augmentation scheme, EmbedAug, which works by replacing a set of randomly selected speech embeddings by either zeros or Gaussian noise during training. EmbedAug does not require additional data, works online during training and adds very little to the overall computational cost. Using Librispeech 100h, Librispeech 960h and MUCS21 multilingual dataset, we show that the proposed EmbedAug is very effective in improving the robustness of ASR systems. Moreover, EmbedAug can be fine tuned on the development set with the help of just one hyperparameter.