This paper presents speaker recognition (SR) systems submitted by the
Speech Technology Center (STC) team to the Far-Field Speaker Verification
Challenge 2020. SR tasks of the challenge are focused on the problem
of far-field text-dependent speaker verification from single microphone
array (Track 1), far-field text-independent speaker verification from
single microphone array (Track 2) and far-field text-dependent speaker
verification from distributed microphone arrays (Track 3).
In this paper, we
present techniques and ideas underlying our best performing models.
A number of experiments on x-vector-based and ResNet-like architectures
show that ResNet-based networks outperform x-vector-based systems.
Submitted systems are the fusions of ResNet34-based extractors, trained
on 80 Log Mel-filter bank energies (MFBs) post-processed with U-net-like
voice activity detector (VAD). The best systems for the Track 1, Track
2 and Track 3 achieved 5.08% EER and 0.500 Cmindet,
5.39% EER and 0.541 Cmindet and 5.53% EER and
0.458 Cmindet on the challenge evaluation sets
respectively.