ISCA Archive ISCSLP 2000
ISCA Archive ISCSLP 2000

Optimization of N-gram Parameters for Natural Language Processing

Gongjun Li, Na Dong, Toshiro Ishikawa

In this paper we present the drawbacks of conventional approaches to the estimation of ngram in Chinese natural language processing, that is, the optimization of n-gram parameters is independent of its discriminative capability. To fight with this problem, we bring up with discriminative estimation criterion, on which the parameters of n-grams can be optimized. We implement this approach on the platform of the conversion from Chinese pinyin to Chinese character. We conduct experiments based on the tagged text corpus by Peking University. Experimental results show that the conversion rate can be remarkably raised by at most 41.4%.