ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

A fast multilingual probabilistic tagger

Evangelos Dermatas, George Kokkinakis

This paper presents and compares two versions of a novel automatic tagging system which is both language and tagset independent and has close to real-time response in personal computers. The system's prediction model is based on the HMMchain theory and tags each word of a text, which includes also unknown words, using the Viterbi algorithm. The first version carries out floating-point arithmetic operations while the second version these operations have been transformed to fixed-point ones. Thus a significant time response reduction is achieved with negligible influence ( <0.01%) on the prediction accuracy. The tagging system was tested on newspaper texts of 7 European languages using various sets of grammatical categories and texts with and without unknown words. The results proved to be satisfactory.

Keywords: Probabilistic tagging, taggers, Viterbi algorithm, HMM, natural language processing.