ISCA Archive Interspeech 2011
ISCA Archive Interspeech 2011

Template-based automatic speech recognition meets prosody

Dino Seppi, Kris Demuynck, Dirk Van Compernolle

In this paper, we use prosodic information to improve the accuracy of our template-based automatic speech recognizer. Prosodic information is harvested adopting a data-driven approach. A number of prosodic features is extracted, then combined into major groups, and finally studied separately and together. All acoustic evidence, both segmental and suprasegmental, is modelled non-parametrically. The different sources of information are conveniently combined with segmental conditional random fields. Prosody enhances the accuracy of the state-of-the-art baseline by reducing the word error rate by 7% relative on the nov92, 20k trigram, Wall Street Journal task.