ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Novel Augmentation Schemes for Device Robust Acoustic Scene Classification

Sukanya Sonowal, Anish Tamse

For audio classification tasks, one has access to the recordings from only a few microphones while the system could be deployed for a wider range of microphones. This paper discusses augmentation methods for audio scene recognition with the aim of improving performance on recordings from unseen microphones. The proposed augmentation schemes can be broadly classified into two categories. The first category which is called the frequency response augmentation technique, aims to artificially generate ‘new' microphone frequency responses. This is achieved by collecting microphone impulse responses from a publicly available library and applying image augmentation techniques on them to create a more diverse set of frequency responses. The train data is then augmented with these artificially generated frequency responses. The second category consists of the amplitude augmentation and random frame drop methods which are simple yet effective in further boosting the performance. We test all these augmentation methods on various architectures and observe a good classification accuracy of 76.0% on the DCASE 2020 Task 1a set. Especially on unseen devices our best reported accuracy, without using any model ensembles, is 74.24%.