ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Application of convolutional neural networks to speaker recognition in noisy conditions

Mitchell McLaren, Yun Lei, Nicolas Scheffer, Luciana Ferrer

This paper applies a convolutional neural network (CNN) trained for automatic speech recognition (ASR) to the task of speaker identification (SID). In the CNN/i-vector front end, the sufficient statistics are collected based on the outputs of the CNN as opposed to the traditional universal background model (UBM). Evaluated on heavily degraded speech data, the CNN/i-vector front end provides performance comparable to the UBM/i-vector baseline. The combination of these approaches, however, is shown to provide improvements of 26% in miss rate to considerably outperform the fusion of two different features in the traditional UBM/i-vectors approach. An analysis of the language- and channel-dependency of the CNN/i-vector approach is also provided to highlight future research directions.


doi: 10.21437/Interspeech.2014-172

Cite as: McLaren, M., Lei, Y., Scheffer, N., Ferrer, L. (2014) Application of convolutional neural networks to speaker recognition in noisy conditions. Proc. Interspeech 2014, 686-690, doi: 10.21437/Interspeech.2014-172

@inproceedings{mclaren14_interspeech,
  author={Mitchell McLaren and Yun Lei and Nicolas Scheffer and Luciana Ferrer},
  title={{Application of convolutional neural networks to speaker recognition in noisy conditions}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={686--690},
  doi={10.21437/Interspeech.2014-172},
  issn={2308-457X}
}