ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Mutual information phone clustering for decision tree induction

Ciprian Chelba, Rachel Morton

The paper presents an automatic method for devising the question sets used for the induction of classification and regression trees. The algorithm employed is the well-known mutual information based bottom-up clustering applied to phone bigram statistics. The sets of phones at the nodes in the resulting binary tree are used as question sets for clustering context-sensitive (tri-phone) HMM output distributions in a large vocabulary speech recognizer. The algorithm is shown to perform as well and sometimes significantly better than question sets devised by human experts for a Spanish and German system evaluated on several tasks, respectively. It eliminates the need for linguistic expertise and it provides a faster solution as well.