ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation Using Improved Differentiable Automatic Data Augmentation

Toki Sugiura, Hiromitsu Nishizaki

Data augmentation is one of the methods used to robustly train machine learning models with a small dataset. This method randomly applies pre-defined data processing operations to input data, regardless of the characteristics of the input data. However, some data processing operations may be inappropriate for certain data. In this study, we propose a new method to automatically search for the best data processing operations for each sound file to be input into a sound classification neural network. The proposed method is an improvement on the previously proposed differentiable automatic data augmentation (DADA), which uses a differentiable neural network to select the optimal data processing operations. We evaluated our proposed method on an acoustic scene classification task on the ESC-50 dataset and demonstrated that the proposed method can train a more robust model compared to the original DADA-based data augmentation.