ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

First Analyze Then Enhance: A Task-Aware System for Speech Separation, Denoising, and Dereverberation

Shaoxiang Dang, Li Li, Shogo Seki, Hiroaki Kudo
This paper presents the First Analyze Then Enhance (FATE) framework for speech enhancement. In FATE, observed signals are initially classified based on their specific degradation types and then enhanced using appropriate modules tailored to each type. This design prevents the overprocessing of clean signals and reduces process complexity by eliminating unnecessary procedures. This paper focuses on commonly encountered degradations, including additive noise, reverberation, speech mixing, and their combinations. To address these degradations, FATE includes a separation submodule and a denoising/dereverberation submodule the enhancement models. Degradation types are predicted using features extracted from pretrained models based on automatic speech recognition and self-supervised learning. Experiments show that FATE can accurately identify undegraded signals and achieve comparable enhancement performance for degraded signals in each scenario while optimizing processing complexity.