ISCA Archive ICSLP 1992
ISCA Archive ICSLP 1992

Multilingual phoneme to grapheme conversion system based on HMM

P. A. Rentzepopoulos, George K. Kokkinakis

A phoneme to grapheme conversion system which is based on Hidden Markov Models and on the Viterbi algorithm is presented. Being an entirely statistical method, the conversion does not need a dictionary, resulting in a system which can handle any word of a given language. The language model is created from a verticalized text which contains both the graphemic and the phonemic form of each word. The size of the training text sufficient for a good performance of the algorithm is not big (the transition matrix is saturated after about 30-40 thousand words). It must also be noted that there is no need for an expert for the training since this is done automatically. Considering that the training procedure is done off line, the proposed algorithm has many advantages: it is fast, it does not need large amount of memory and responds linearly to the length of the input word. Furthermore it does not need a language model to disambiguate homophones. Finally it is mainly language independent and so it can be applied to any language with only minor modifications. This system was initially tested on the Greek language. Afterwards, the system was also tested on Italian, French, English and German with various success rates depending on the language specific parameters. In this paper, the results of the tests on the above languages are presented and evaluated taking into account the various parameters of the system. This system is also compared to a phoneme to grapheme conversion system which is based on rules.