ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Real-time TSE demonstration via SoundBeam with KD

Keigo Wakayama, Tomoko Kawase, Takafumi Moriya, Marc Delcroix, Hiroshi Sato, Tsubasa Ochiai, Masahiro Yasuda, Shoko Araki

The objective of target sound extraction (TSE) is to extract sound sources of a specified class from mixed signals. Research into TSE has been actively conducted with the aim of applying it to immersive systems and auditory devices. We propose to demonstrate a real-time TSE system, which can isolate the signal from a desired sound class from sound mixtures recorded on the fly. This demonstration is based on the recently proposed causal SoundBeam model, which is trained using knowledge distillation (KD) from a non-causal TSE system. Experiments have demonstrated that SoundBeam with KD exhibits superior extraction accuracy compared to a state-of-the-art (SOTA) TSE, i.e., Waveformer. This paper explains the implementation of the proposed real-time TSE demonstration system. It is noteworthy that this demonstration will show for the first time at Interspeech, the ability to extract sound signals of a selected sound event (SE) class in real-time on a laptop.