This paper presents speaker recognition (SR) systems for the text-independent
speaker verification under the cross-lingual (English vs Persian) task
(task 2) of the Short-duration Speaker Verification Challenge (SdSVC)
2021.
We present the description of applied ResNet-like and ECAPA-TDNN-like
topology design solutions as well as an analysis of multi-session scoring
techniques benchmarked on the SdSVC challenge datasets. We overview
various modifications of the basic ResNet-like architecture and training
strategies, allowing us to obtain the improved quality of speaker verification.
Also, we introduce the alpha query expansion-based technique (αQE)
to the enrollment embeddings aggregation at test time, which results
in a 0.042 minDCF improvement from 0.12 to 0.078 for the ECAPA-TDNN
system compared to the embeddings mean. We also propose a trial-level
distance-based non-parametric imposter/target detector (KrTC) used
to filter out the worst enrollment samples at test time to further
improve the performance of the system.