ISCA Archive SpeechProsody 2012
ISCA Archive SpeechProsody 2012

Multi-stage feature normalization for robust German stressed/unstressed syllable classification

Yuan-Fu Liao, Yan-Ting Chen, Jhen-Lun Huang

To develop a German computer assisted language learning (CALL) system for students whose mother's tongues are syllable- or mora-timed, a multi-stage feature normalization scheme which takes both word stress and sentence intonation patterns into consideration is proposed for German stressed/unstressed syllable classification. The main idea is to first apply Fujisaki model and band-pass filtering to pitch and energy contours, respectively, to remove the undesired sentence intonation component and sequentially normalize the extracted features in syllable- and supra-segment-level. Comparing with traditional Z-Score feature normalization baseline, the proposed method achieved lower classification error rate (27.04% vs. 31.34%) on “The Kiel Corpus of Read Speech, Vol. I” database. Besides, by integrating decision tree-based feature selection and long-span contextual prosodic cues, the system performance was further improved to 24.68%.

Index Terms: prosodic feature normalization, German stressed/unstressed syllable classification, Fujisaki model