ISCA Archive SpeechProsody 2006
ISCA Archive SpeechProsody 2006

A method for decomposing and modeling jitter in expressive speech in Chinese

Lei Wang, Aijun Li, Qiang Fang

Jitter is considered as one of the most crucial factors to the aim of synthesizing natural motional speech. Unlike the traditional methods of measuring jitter in emotional speech, this paper propose that the jitter in the speech could be decomposed into two parts, that to say, deterministic jitter and random jitter. Deterministic jitter is associated with certain causes that may be the affect caused by emotion state, while random jitter is the result by random events that have nothing to do with emotion. What is more, two different methods of modeling jitter distribution are described: jitter decomposition is based on the fact that the mixed jitter can be divided into deterministic part and random part, while the algorithm based on GMM tries to simulate the shape of the histogram of jitter distribution. The result makes a qualitative analysis of the two methods. There are still much of works for us to do in the future in order to do more detail analysis and to make quantitative analysis of them.