ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Boosting StoRM Convergence with Metric Guidance and Non-uniform State-Sampling for Optimal Dereverberation

Chandra Mohan Sharma, Arnab Kumar Roy, Anupam Mandal, Prasanta Kumar Ghosh, Prasanna Kumar Kr

This paper proposes a novel approach to address late reverberation, which degrades speech intelligibility by convolving clean speech with room impulse response. Our method combines metric-guided training and non-uniform state sampling within the Stochastic Regeneration Model (StoRM) diffusion architecture, enabling better diffusion variability modeling while maintaining computational efficiency. Key metrics such as STFT loss, spectral convergence loss, Mel Frequency Cepstral Coefficient (MFCC) loss and log-magnitude loss guide the regeneration process, improving convergence by reducing training epochs by ~19.6% with slight improvements in dereverberation. Meanwhile, the non-linear state sampling approach enhances training convergence by ~27.2% with practically similar perceptual performance. We evaluate the impact of these modifications on automatic speech recognition and clean speech distortion relative to the baseline, demonstrating optimal speech-quality-aware performance.