ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Learning prosodic sequences using the fundamental frequency variation spectrum

Kornel Laskowski, Jens Edlund, Mattias Heldner

We investigate a recently introduced vector-valued representation of fundamental frequency variation, whose properties appear to be well-suited for statistical sequence modeling. We show what the representation looks like, and apply hidden Markov models to learn prosodic sequences characteristic of higher-level turn-taking phenomena. Our analysis shows that the models learn exactly those characteristics which have been reported for the phenomena in the literature. Further refinements to the representation lead to a 12-17% relative improvement in speaker change prediction for conversational spoken dialogue systems.