ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Improving latent semantic indexing based classifier with information gain

Li Li, Wu Chou

In this paper, we describe an approach of using a discriminative term selection process based on information grain (IG) to improve the performance of the latent semantic indexing (LSI). The discriminative power of the term is measured by entropy variations averaged over all categories conditioned upon whether the term is present or absent. The proposed approach is applied to the task of natural language call routing (NLCR), where natural language based classifiers are used to route calls to desired destinations. Various experimental studies are performed. Significant performance gains of 27% on precision and 26.5% on recall are observed. Most importantly, the proposed approach is almost independent of task dependent language resources and robust to term variations, making it highly portable to various information retrieval and natural language understanding tasks.