ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Minimum classification error training for speaker identification using Gaussian mixture models based on multi-space probability distribution

Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura

In our previous work, we have proposed a speaker modeling technique using spectral and pitch features for text-independent speaker identification based on Multi-Space Probability Distribution Gaussian Mixture Models (MSD-GMMs). We have presented a maximum likelihood (ML) estimation procedure for the MSD-GMM parameters and demonstrated its high recognition performance. In this paper, we describe an minimum classification error (MCE) training procedure for the MSD-GMM speaker models. MCE training is also applied to automatically estimate mixture-dependent stream weights for spectral and pitch streams. The MCE-based MSD-GMM speaker models are evaluated for a text-independent speaker identification task. Experimental results show that MCE training of the MSD-GMM parameters significantly reduces identification errors and system performance is further improved by appropriately weighting spectral and pitch streams using MCE training.