ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Weighted correlation based atom decomposition intonation modelling

Branislav Gerazov, Pierre-Edouard Honnet, Aleksandar Gjoreski, Philip N. Garner

Intonation modelling is an integral part of text-to-speech systems from their very beginnings. This has led to the proliferation of various intonation models, each with its own relative strengths and weaknesses. Only a few of these intonation models are based on physiology, despite the advantage that such models are language independent. We propose a new intonation model inspired by the physiology of intonation production, which is based on decomposing the F0 contour into elementary atoms. The model, named the Weighted Correlation Atom Decomposition model (WCAD), is a generalisation of the command response (CR) model and has the advantage of having a simple parameter extraction method. The decomposition process follows a matching pursuit approach based on using the perceptually relevant weighted correlation as a cost function. The results have affirmed the plausibility of using the WCAD model to model F0 contours across different languages and speakers. The results have also shown that the WCAD model has good comparative performance to the CR model, giving it practical importance.