ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

An evaluation of cross-language adaptation and native speech training for rapid HMM construction based on very limited training data

Xufang Zhao, Douglas O'Shaughnessy

As the needs and opportunities for speech technology applications in a variety of languages have grown, methods for rapid transfer of speech technology across languages have become a practical concern. Previous works focus on the comparison of different adaptation algorithms, for example, MAP (Maximum A Posterior), Bootstrap, and MLLR (Maximum Likelihood Linear Regression) on speaker adaptation. However, a very interesting point is that, with increasing adaptation corpora, the performance of direct native speech training may already exceed the performance of cross-language adaptation. If it is true, there should be a threshold for the size of an adaptation corpus. In general, transferring acoustic knowledge is useful when there is not enough training data available. This paper presents a systematic comparison of the relative effectiveness of cross-language adaptation and native speech training, using transfer from English to Mandarin as a test case. This study found that cross-language adaptation does not produce better acoustic models than the direct native speech training approach even using limited training data.