ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification

Jiadi Yao, Chengdong Liang, Zhendong Peng, Binbin Zhang, Xiao-Lei Zhang

Currently, ECAPA-TDNN is one of the state-of-the-art deep models for automatic speaker verification (ASV). However, it focuses too much on local feature extraction with fixed local ranges, without paying much attention to global feature extraction. To deal with this issue, in this paper, we propose Branch-ECAPA-TDNN, which uses two parallel branches to extract features with various ranges and abstract levels. One branch employs multi-head self-attention to capture long-range dependencies, while the other branch utilizes an SE-Res2Block module to model local multi-scale characteristics. To improve the feature fusion, we further apply different merging methods to aggregate features from both branches. Experimental results demonstrate that the proposed Branch-ECAPA-TDNN achieves a relative EER reduction of 24.10% and 7.92% over ECAPA-TDNN on the VoxCeleb and CN-Celeb datasets, respectively.