Gaussian Mixture Models (GMMs) in combination with Support Vector Machine (SVM) classifiers have been shown to give excellent classification accuracy in speaker recognition.
In this work we use this approach for language identification, and we compare its performance with the standard approach based on GMMs.
In the GMM-SVM framework, a GMM is trained for each training or test utterance. Since it is difficult to accurately train a model with short utterances, in these conditions the standard GMMs perform better than the GMM-SVM models.
To overcome this limitation, we present an extremely fast GMM discriminative training procedure that exploits the information given by the separation hyperplanes estimated by an SVM classifier. We show that our discriminative GMMs provide considerable improvement compared with the standard GMMs and perform better than the GMM-SVM approach for short utterances, achieving state of the art performance for acoustic only systems.