ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Speech recognition using fundamental frequency and voicing in acoustic modeling

Andrej Ljolje

Prosody has long been studied as a knowledge source in speech processing. We attempt to directly exploit prosodic correlates in acoustic modeling of speech for large vocabulary recognition. We compare two methods for using the fundamental frequency and voicing parameters. The more complex approach starts by modeling prosodic classes and using a representation of their recognized sequences as acoustic features. The simpler approach simply adds suitably normalized raw values to the conventional mel cepstral coefficients in the observation vectors. The simpler approach achieves modest accuracy gains on HUB-5 Eval-2001 test set.