ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Integrating online i-vector extractor with information bottleneck based speaker diarization system

Srikanth Madikeri, Ivan Himawan, Petr Motlicek, Marc Ferras

Conventional approaches to speaker diarization use short-term features such asMel Frequency Cepstral Co-efficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the i-vector modelling technique is adapted to be used as short term features for diarization by estimating i-vectors over a short window of MFCCs. The Information Bottleneck (IB) approach provides a convenient platform to integrate multiple features together for fast and accurate diarization of speech. Speaker models are estimated over a window of 10 frames of speech and used as features in the IB system. Experiments on the NIST RT datasets show an absolute improvement of 3.9% in the best case when i-vectors are used as auxiliary features to MFCCs. Further, discriminative training algorithms such as LDA and PLDA are applied on the i-vectors. A best case performance improvement of 5% in absolute terms is obtained on the RT datasets.