ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

A Simple RNN Model for Lightweight, Low-compute and Low-latency Multichannel Speech Enhancement in the Time Domain

Ashutosh Pandey, Ke Tan, Buye Xu

Deep learning has led to unprecedented advances in speech enhancement. However, deep neural networks (DNNs) typically require large amount of computation, memory, signal buffer and processing time to achieve strong performance. Designing a DNN to meet a given resource constraint requires dedicated efforts. This study proposes a novel recurrent neural network (RNN) based model for time-domain multichannel speech enhancement that can be easily tuned to meet a given constraint. We present results of training the model at different scales, where algorithmic latency varies from 1 ms to 16 ms, model size varies from 100 Thousand to 25 Million parameters, and compute to process one second of speech varies from 100 Mega to 25 Giga multiply-accumulates (MACs). Experimental results demonstrate that the proposed model can obtain similar or better performance using fewer computes and parameters than competitive approaches to low-latency multichannel speech enhancement.