ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization

Andreas Nautsch, Rahim Saeidi, Christian Rathgeb, Christoph Busch

The biometric and forensic performance of automatic speaker recognition systems degrades under noisy and short probe utterance conditions. Score normalization is an effective tool taking into account the mismatch of reference and probe utterances. In an adaptive symmetric score normalization scheme for state-of-the-art i-vector recognition systems, a set of cohort speakers are employed to calculate the mean and variance of impostor scores when compared to reference and probe i-vectors. In dealing with real-life conditions where the quality of audio recordings in test phase does not match enrolment utterance(s) of speakers, we demonstrate the effectiveness of utilizing a condition-matched cohort set for score normalization. The cohort set audio material is shortened and degraded by noise in different reasonable and controlled signal-to-noise ratios according to expected test conditions, yielding in multiple set of cohorts. Further, we propose automatic cohort pre-selection based on modeling each degradation category. For each i-vector, a quality vector is assigned as the posterior probability of degradation classes. The cohort set is then formed by i-vectors representing small KL-divergence of respective quality vectors when compared to reference and probe. Further gains are observed by including this quality vector also into the score calibration.