ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Estimating syntactic structure from F0 contour and pause duration in Japanese speech

Yasuo Horiuchi, Tomoko Ohsuga, Akira Ichikawa

In this study, we introduce a method for estimating the syntactic structure of Japanese speech from F0 contour and pause duration. We defined a prosodic unit (PU) which is bound by a local minimum point of an F0 contour pattern or pause. Combining PUs repeatedly (a pair of PUs is combined into one PU), a tree structure is gradually generated. Which pair of PUs in a sequence of three PUs should be combined is decided by the discriminant function based on the discriminant analysis of many speech data. We applied the method to the ATR 503 Phonetically Balanced Sentences read by four Japanese speakers. As a result, the correct rate of judgement for each sequence of three PUs is 79% and the estimation accuracy of the entire syntactic structure for each sentence is 26%. We consider this result to be fairly good for the difficult task of estimating a syntactic structure only from prosody.