ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Background-aware Modeling for Weakly Supervised Sound Event Detection

Yifei Xin, Dongchao Yang, Yuexian Zou

Nowadays, a common framework for weakly supervised sound event detection (WSSED) is multiple instance learning (MIL). However, MIL directly optimizes the clip-level classification results, so it tends to localize the most distinct part rather than the entire sound event, making the indiscriminating parts of sound events mistakenly identified as background sounds. In this paper, we focus on adding background awareness for WSSED by proposing a learning structure called BA-WSSED. Our BA-WSSED first introduces a pseudo separator with softmax activation and two aggregators to purify and aggregate the event feature and the background feature, respectively. Then, with the help of the proposed background-aware staggered (BAS) loss, both the event classifier and the background classifier are learned to generate staggered classification scores for discerning and suppressing background sounds. Experiments show that our BA-WSSED significantly improves the performance of the general MIL-based WSSED method on multiple datasets and can be employed on various baseline models.