ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network

Zhiheng Liao, Feifei Xiong, Juan Luo, Minjie Cai, Eng Siong Chng, Jinwei Feng, Xionghu Zhong

This paper presents a generative neural network to estimate room impulse response (RIR) directly from the received reverberant speech in single-channel scenario. Complex spectrogram of the reverberant speech is used as the input of an encoder to produce the compact acoustic embedding, which is then fed to a generator to construct the related time-domain acoustic response. To avoid a large model to generate the RIR with long taps, we propose SG-RIR, a novel segmental generative network that splits the RIR into segments and shares the network parameters across segments for blind RIR estimation. Experimental results show that the proposed model is capable of estimating the time-domain RIR with mean error of 0.008 in terms of both simulated and measured RIR test sets. The effectiveness is further verified by the achieved competitive estimation accuracy of two key room acoustic parameters (the reverberation time RT and the direct-to-reverberant ratio DRR) as compared to state-of-the-art approaches that are specific for RT and DRR estimation.