In speech technology, we found several challenges in automatic speech transcription system for multilingual conferences or meetings. Firstly, the dialog occurs between native and non-native speakers. Secondly, the non-native speakers come from different parts of the world (e.g., English spoken by native French speakers or English spoken by native Vietnamese speakers, etc.). Thirdly, no data or a limited amount of data is available to bootstrap the acoustic modeling. This paper presents some autonomous online and offline acoustic model adaptation approaches, which required no additional data in the adaptation process, to deal with above challenges as well as to improve the performance of the phone recognizers used for automatic transcription purpose. Experiments show that our adaptation approach (online interpolation with MLLR based on PRVSM) can provide about 4% absolute gain in Phone Accuracy Rate (PAR) compared to the multilingual baseline system and it is even better than the performance of the supervised monolingual systems.
Index Terms: ASR, multilingual acoustic modeling, language label voting, PR-VSM, MLLR.