ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Enhancing ECAPA-TDNN with Feature Processing Module and Attention Mechanism for Speaker Verification

Shiu-Hsiang Liou, Po-Cheng Chan, Chia-Ping Chen, Tzu-Chieh Lin, Chung-Li Lu, Yu-Han Cheng, Hsiang-Feng Chuang, Wei-Yu Chen

In this paper, we introduce three methods to enhance the state-of-the-art ECAPA-TDNN model for speaker verification, namely self-calibration (SC), simple attention mechanism (SimAM), and a modified temporal dynamic convolution (MTDY) based front-end module. The SC module expands the model’s receptive field and improves spatial attention for better capture of contextual information. The SimAM attention mechanism assigns unique weights to individual neurons, so it can place greater emphasis on more informative ones. The MDTY-based front-end module adapts itself to diverse temporal speech features with adaptive convolutional kernels, and aggregates these kernels to capture temporal variations with attention weights. Our proposed model, IM ECAPA MTDY-TDNN SimAM, demonstrates improved performance and complexity trade-offs compared to recent research works. On the VoxCeleb1-H test set, it achieves a 1.655% EER and 0.157 minDCF with 9.71M parameters and 1.97G FLOPs.