ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Multilingual prosody modelling using cascades of regression trees and neural networks

J. W. A. Fackrell, H. Vereecken, J.-P. Martens, Bert Van Coile

This paper describes the use of automatically-trained models (Regression Trees and Multilayer Perceptrons) to predict three prosodic variables – phrase-boundary strength, word prominence and phoneme duration. The models are arranged in a cascade so that the predictions of phrase-boundaries are used as input features to the prominence model, and so on. Cascade models of this type have been constructed for 6 languages, using specially constructed databases, and objective performance statistics are described. For two languages (American English and Dutch) the results of a subjective evaluation experiment suggest that these prosodic models are at least as good as hand-crafted models, and sometimes better. Furthermore, preparing the training data automatically, rather than by manual labelling, seems to have no negative impact on the model performance.