ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Decision-tree based quantization of the feature space of a speech recognizer

Mukund Padmanabhan, L. R. Bahl, D. Nahamoo, Pieter de Souza

We present a decision-tree based procedure to quantize the feature-space of a speech recognizer, with the motivation of reducing the computation time required for evaluating gaussians in a speech recognition system. The entire feature space is quantized into non overlapping regions where each region is bounded by a number of hyperplanes. Further, each region is characterized by the occurence of only a small number of the total alphabet of allophones (sub-phonetic speech units); by identifying the region in which a test feature vector lies, only the gaussians that model the density of allophones that exist in that region need be evaluated. The quantization of the feature space is done in a heirarchical manner using a binary decision tree. Each node of the decision tree represents a region of the feature space, and is further characterized by a hyperplane (a vector v n and a scalar threshold value hn ), that subdivides the region corresponding to the current node into two non-overlapping regions corresponding to the two children of the current node. Given a test feature vector, the process of finding the region that it lies in involves traversing this binary decision tree, which is computationally inexpensive. We present results of experiments that show that the gaussian computation time can be reduced by as much as a factor of 20 with negligible degradation in accuracy.