ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Speaker diarization with i-vectors from DNN senone posteriors

Gregory Sell, Daniel Garcia-Romero, Alan McCree

Motivated by recent gains in speaker identification by incorporating senone posteriors from deep neural networks (DNNs) into i-vector extraction, we examine similar enhancements to speaker diarization with i-vector clustering. We examine two DNNs with different numbers of senone targets in combination with a diagonal or full covariance universal background model (UBM) in the context of the multilingual corpus CALLHOME. Results show that the larger DNN with a full covariance UBM gives the best performance. The improvements appear to have a strong dependence on number of speakers in a conversation, and a lesser dependence on language. Overall, when combined with resegmentation, the proposed system improves CALLHOME performance to 10.3% DER.