This paper describes a method of robustly training context-dependent multiple Gaussian mixture HMM phone models without the need for a posteriori smoothing. The method involves clustering and then tying acoustically similar states within each allophone set in order to balance model complexity against the available data. The operational properties of the method are studied and results are presented for phone recognition on TIMIT. The method is shown to be robust, to give good recognition performance and to reduce computation in both recognition and training. All experiments were performed using the HTK portable HMM toolkit.
Keywords: HMM state clustering phone recognition TIMIT HTK