In this paper, a new automatic language identification method using sparse representation on ivectors in low-dimensional total variability space is proposed. It is mainly based on the recently proposed i-vector based language recognition systems. In our proposed method, an over-complete dictionary is first constructed by randomly sampling of the low-dimensional total variability space after Within-Class Covariance Normalization (WCCN) and Linear Discriminate Analysis (LDA). And then for each test sample, the classification score is derived from sparse linear representation with respect to the over-complete dictionary. Furthermore, a random subspace method, which combines different sparse representation classifiers, is introduced to address the possible over-fitting issue and to improve the robustness of the estimation. Evaluations on NIST LRE 2007 dataset show that the proposed method outperforms the state-of-the-art i-vector based language recognition system. Especially for 30s test condition, our proposed method achieves relative reduction of 29.6% on Equal Error Rate (EER) compared with the baseline system.
Index Terms: language recognition, ivector, sparse representation