ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Enhancing Embeddings for Speech Classification in Noisy Conditions

Mohamed Nabih Ali, Alessio Brutti, Falavigna Daniele

Robustness against noise is critical for several speech applications in real-world environments. In general, to improve the robustness, a speech enhancement front-end is integrated as a preprocessing stage, often jointly trained with the network backend to reduce the impact of distortions and artifacts on the performance. Recently, the use of speech representation computed using pre-trained models on large amounts of data, as Wav2Vec, has proved to be effective in a variety of speech processing and classification tasks. However, the performance of these models, although very robust, deteriorates in presence of environmental noise. In this paper, we investigate how enhancement can be applied in neural speech classification architectures employing pre-trained speech embeddings. We investigate two approaches: one applies time-domain enhancement prior to extracting the embeddings; the other employs a convolutional neural network to map the noisy embeddings to the corresponding clean ones. Exhaustive experiments on the Fluent Speech Commands and Google Speech Commands corpora, contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, sheds light and provide insights about the most promising enhancement training approaches.