ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Automatic assignment of part-of-speech to out-of-vocabulary words for text-to-speech processing

Frédéric Béchet, Marc El-Bèze

Working with large corpora of text highlights the need for the special treatment of Out-Of-Vocabulary (OOV) words. This paper describes a strategy for processing OOV words within a Text-To-Speech (TTS) framework of the French language. A probabilistic module, called "Devin", guesses a Part-Of-Speech (POS) for each OOV word according to the morphological structure of the word and the context in which it occurs. These POS can be either syntactic or semantic. The semantic labels represent the categories of each proper-name (family name, town name, etc.) and its linguistic origin which has a strong influence on its pronunciation. According to these POS, the system chooses the correct set of rules which will be employed by the rule- based grapheme-to-phoneme transcriber of the TTS system.