ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Modified-prior i-vector estimation for language identification of short duration utterances

Ruchir Travadi, Maarten Van Segbroeck, Shrikanth S. Narayanan

In this paper, we address the problem of Language Identification (LID) on short duration segments. Current state-of-the-art LID systems typically employ total variability i-Vector modeling for obtaining fixed length representation of utterances. However, when the utterances are short, only a small amount of data is available, and the estimated i-Vector representation will consequently exhibit significant variability, making the identification problem challenging. In this paper, we propose novel techniques to modify the standard normal prior distribution of the i-Vectors, to obtain a more discriminative i-Vector extraction given the small amount of available utterance data. Improved performance was observed by using the proposed i-Vector estimation techniques on short segments of the DARPA RATS corpora, with lengths as small as 3 seconds.