A segment-based Automatic Language Identification (ALI) system has been developed. The system was designed around a formal probabilistic framework. This framework forms the basis for investigating the ALI approach proposed by House and Neuburg which utilizes phonotactic constraints of languages. The system incorporates different components which model the phonotactic, prosodic, and acoustic properties of the different languages used in the system. The system was trained and tested using the OGI Multi-Language Telephone Speech Corpus. An overall system performance of 47.7% was achieved in identifying the language of test utterances.
Keywords: Automatic language identification.