ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

State clustering improvements for continuous HMMs in a Spanish large vocabulary recognition system

R. Córdoba, J. Macías-Guarasa, J. Ferreiros, J. M. Montero, José M. Pardo

In this paper we present a whole set of improvements that have been applied to a large vocabulary isolated-word recognition system using continuous models. This system has been used in the EU funded IDAS project (LE4-8315), where an automated interactive telephonebased directory assistance service has been developed. We cover both improvements in the techniques for continuous HMM reestimation and agglomerative clustering for context-dependent models, all of them applied to our database in Spanish. Specifically, we will show how a new distance between states can greatly improve the performance of the clustering process. We show a new strategy for the clustering itself based in multiple Gaussian clustering which improved the results too. And finally, we present a new way to find the optimum number of Gaussians for each state that can be applied to both context dependent and context independent models.