ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions

Prithvi R.R. Gudepu, Jayesh M. Koroth, Kamini Sabu, Mahaboob Ali Basha Shaik

The majority of online Voice Activity Detection (VAD) models employ a Recurrent Neural Network (RNN) component to capture long context which helps to improve noise-robustness. These RNN components are static models which do not make efficient use of the model's predictions from previous frames. In this work, we introduce a new Dynamic Encoder RNN (DE-RNN) that encodes the target speech dynamically to facilitate distinguishing of target speech from noise. Experiments on different established baseline architectures by modifying their RNN component by the addition of DE-RNN, show improvement in both background noise and secondary competing speaker noise scenarios. We used publicly available datasets for experiments.