ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Constrained maximum mutual information dimensionality reduction for language identification

Shuai Huang, Glen A. Coppersmith, Damianos Karakos

In this paper we propose Constrained Maximum Mutual Information dimensionality reduction (CMMI), an information-theoretic based dimensionality reduction technique. CMMI tries to maximize the mutual information between the class labels and the projected (lower dimensional) features, optimized via gradient ascent. Supervised and semi-supervised CMMI are introduced and compared with a state of the art dimensionality reduction technique (Minimum/Maximum Rényi's Mutual Information using the Stochastic Information Gradient; MRMISIG) for a language identification (LID) task using CallFriend corpus, with favorable results. CMMI also deals with higher dimensional data more gracefully than MRMI-SIG, permitting application to datasets for which MRMI-SIG is computationally prohibitive.