ISCA Archive ICSLP 1996
ISCA Archive ICSLP 1996

Word class driven synthesis of prosodic annotations

Simon Arnfield

Prosody is an important aspect of speech that current text to speech synthesis systems fail to mimic in a convincing or natural way [1,2,3,4]. This paper describes research on a partial system for prosodic synthesis using easily derived low level syntactic information. A computer program has been developed that can annotate unseen text with prosodic stress and tone marks using the sequence of part of speech tags previously assigned to each word by a tagging system. Training and testing material was taken from the Lancaster/IBM Spoken English Corpus (SEC). Co-occurrence measures were calculated relating stress and tone mark annotations to the word class annotation information. A model was developed around the statistical information which calculates a score for all possible mappings between a given part of speech sequence and all the potential stress/tone annotations. The highest scoring pattern is selected as that which is the most likely \baseline" annotation, according to the model. Performance figures attain up to 91% agreement with the original corpus annotations.