ISCA Archive SpeechProsody 2004
ISCA Archive SpeechProsody 2004

A maximum likelihood prosody recognizer

Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Jennifer Cole

Automatic prosody recognition (APR) is of fundamental importance for automatic speech understanding. In this paper, we propose a maximum likelihood prosody recognizer consisting of a GMM-based acoustic model that models the distribution of the phone-level acoustic-prosodic observations (pitch, duration and energy) and an ANN-based language model that models the word-level stochastic dependence between prosody and syntax. Our experiments on the Radio News Corpus show that our recognizer is able to achieve 84% pitch accent recognition accuracy and 93% intonational phrase boundary (IPB) recognition accuracy in a leave-one-speaker-out task which has exceeded previous reported results on the same corpus. The same recognizer is tested on a subset of Switchboard corpus. The accuracies are degraded but still significantly better than the chance levels.