ISCA Archive ISCSLP 2006
ISCA Archive ISCSLP 2006

Incorporating Prosodic with Acoustic Information for ISCSLP'2006 Speaker Recognition Evaluation - Robust Cross-Channel Speaker Verification

Wen-Chieh Chang, Ding-Yun Chen, Zi-He Chen, Zhi-Ren Zeng, Yuan-Fu Liao, Yau-Tarng Juang

In this paper, we present our speaker verification (SV) systems for the cross-channel text-independent and dependent speaker verification (TI-SV and TD-SV) tasks of ISCSLP’2006 speaker recognition evaluation (ISCSLP2006-SRE). To address the cross-channel issues and take advantage of the unique characteristics of Mandarin (i.e., tonal language), prosodic contours are modeled to assist the state-of-the-art spectral feature-based SV systems. Especially, two approaches are proposed including (1) latent prosody analysis (LPA) for modeling the prosodic behaviors of a speaker and (2) a Gaussian mixture model (GMM) for modeling the dynamics of the pitch and energy contours. Experimental results on the evaluation set of ISCSLP2006-SRE had demonstrated that the proposed methods of incorporating prosodic featurebased SV systems with spectral feature-based SV systems outperform the spectral feature only SV systems for both TIand TD-SV tasks, respectively. Keywords: Speaker verification, prosodic information, mean variance normalization and ARMA filtering (MVA), Gaussian mixture model (GMM), test normalization (T-norm), probabilistic latent semantic analysis (PLSA), latent prosody analysis (LPA).