ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Instance-based Temporal Normalization for Speaker Verification

Thanathai Lertpetchpun, Ekapol Chuangsuwanich

One of the challenges in speaker verification is domain mismatch and other effects such as language and emotion. Normalization techniques such as Batch Normalization (BN) have been proven effective in improving neural network training and are a popular choice in many speaker verification networks. However, BN may not be able to adequately normalize the feature map for speaker verification. In this work, we investigate several instance-based normalization methods which are more suitable for speaker verification. We propose the Temporal Normalization layer, which normalizes along the time dimension, and show its effectiveness on four different datasets. Experiments on VoxCeleb2 show a relative improvement of 24.3% and 46.15% in terms of EER and DCF over fwSE-ResNet34 in VoxCeleb1-O. Furthermore, we present a systematic evaluation of our networks against three other datasets, namely Thai-Central, THAI-SER, and CREMA-D to show its robustness on language and emotional variants.