This paper is a post-evaluation analysis of our efforts in VOiCES 2019
Speaker Recognition challenge. All systems in the fixed condition are
based on x-vectors with different features and DNN topologies. The
single best system reaches minDCF of 0.38 (5.25% EER) and a fusion
of 3 systems yields minDCF of 0.34 (4.87% EER).We also analyze how
speaker verification (SV) systems evolved in last few years and show
results also on SITW 2016 Challenge. EER on the core-core condition
of the SITW 2016 challenge dropped from 5.85% to 1.65% for system fusions
submitted for SITW 2016 and VOiCES 2019, respectively. The less restrictive
open condition allowed us to use external data for PLDA adaptation
and achieve additional small performance improvement. In our submission
to open condition, we used three x-vector systems and also one system
based on i-vectors.
This paper also appears
in session Wed-SS-7-3.