ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection

Wei-Cheng Lin, Luca Bondi, Shabnam Ghaffarzadegan

Data augmentation is a key component to achieve robust and generalizable performance in sound event detection (SED). A well trained SED model should be able to resist the interference of non-target audio events and maintain a robust recognition rate under unknown and possibly mismatched testing conditions. In this study, we propose a novel background domain switch (BDS) data augmentation technique for SED. BDS utilizes a trained SED model on-the-fly to detect backgrounds in audio clips, and switches them among the data points to increase sample variability. This approach can be easily combined with other types of data augmentation techniques. We evaluate the effectiveness of BDS by applying it to several state-of-the-art SED frameworks, and used both publicly available datasets as well as a synthesized mismatched test set. Experiment results systematically show that BDS obtains significant performance improvements from all evaluation aspects. The code is available at: https://github.com/boschresearch/soundseebackgrounddomainswitch