ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram

Yow-Bang Wang, Lin-shan Lee

Many recent studies about tone recognition have focused on model-level issues, either for tone and prosody labeling or LVCSR. This paper, as a contrast, focus on feature-level issues. We propose to use both syllable-level mean and utterance-level standard deviation for pitch feature normalization, instead of the common approach that uses utterance-level mean only. We show its robustness with both affine-invariance property and experiment result. Also, we incorporate tone posteriorgrams in second-pass tone recognition, which further improves tone recognition accuracy.