ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Thesaurus expansion using similar word pairs from patent documents

Yoshimi Suzuki, Fumiyo Fukumoto

In both written and spoken languages, we sometimes use different words in order to describe the same meaning. For instance, we use "constraint" (seigen) and "restriction" (seiyaku) as the same meaning. This makes text classification and text summarization difficult. In order to deal with this problem, dictionaries especially thesauri are used. However, in technical paper and patent documents, a lot of new words which are not given in the dictionary. In this paper, we propose a method to accurately extract words which are semantically similar to each other. Using this method, we extracted similar word pairs from patent documents. We also expand a thesaurus using the extracted similar words.