ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Performance factor analysis for the 2012 NIST speaker recognition evaluation

Alvin F. Martin, Craig S. Greenberg, Vincent M. Stanford, John M. Howard, George R. Doddington, John J. Godfrey

The 2012 NIST Speaker Recognition Evaluation, held in the autumn of 2012, was designed to examine a variety of factors affecting the performance of automatic systems for speaker recognition. Here we examine, for leading systems included in this evaluation, the observed effects on performance of five such factors: the inclusion in test segment speech of environmental noise or of added synthetic noise of one of three types and one of two intensity levels, the duration of test segment speech, the number and the channel type of target speaker training sessions, the type of the microphone channel used in test segment speech, and the sex of the target speaker. This evaluation is notable for being the first in the series to include examination of the effects on performance of synthetic added noise. The greater impact of crowd noise compared to HVAC noise, and of single speaker noise compared to crowd noise is observed. Future evaluation plans are also discussed.

doi: 10.21437/Interspeech.2014-290

Cite as: Martin, A.F., Greenberg, C.S., Stanford, V.M., Howard, J.M., Doddington, G.R., Godfrey, J.J. (2014) Performance factor analysis for the 2012 NIST speaker recognition evaluation. Proc. Interspeech 2014, 1135-1138, doi: 10.21437/Interspeech.2014-290

  author={Alvin F. Martin and Craig S. Greenberg and Vincent M. Stanford and John M. Howard and George R. Doddington and John J. Godfrey},
  title={{Performance factor analysis for the 2012 NIST speaker recognition evaluation}},
  booktitle={Proc. Interspeech 2014},