ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Locally Aligned Rectified Flow Model for Speech Enhancement Towards Single-Step Diffusion

Zhengxiao Li, Nakamasa Inoue

Diffusion models based on stochastic differential equations have been shown to be effective in speech enhancement, a task of recovering clean speech signals from noisy speech signals. However, these models are limited by computational complexity, mainly due to the large number of function evaluations required in the reverse diffusion process. To address this limitation, we propose the locally aligned rectified flow (LARF) model, a diffusion model based on ordinary differential equations that learns a transport mapping between the distributions of clean and noisy speech features. By introducing global and local flow matching losses, LARF restricts the transport mapping to be as straight as possible, resulting in a reduction in the number of function evaluations. In experiments, we demonstrate the effectiveness of LARF on the two speech enhancement datasets: WSJ0-CHiME3 and VoiceBank-DEMAND. On WSJ0-CHiME3, LARF achieved a PESQ of 2.95 and an SI-SDR of 19.3 with a single step.