ISCA Archive Prosody 2001
ISCA Archive Prosody 2001

Prosody and phonetic variability: Lessons learned from acoustic model clustering

Izhak Shafran, Mari Ostendorf, Richard Wright

Most research on the use of prosody in automatic speech processing has focused on F0, energy and duration correlates to prosodic structure. However, there are multiple sources of evidence suggesting that there are spectral correlates as well. This paper presents an analysis of prosodically labeled conversational speech data using acoustic parameters and clustering techniques that are standard in speech recognition. We find acoustic differences primarily associated with segment position at prosodic constituent onsets and at prominent syllables. Importantly, phones at fluent vs. disfluent boundaries are frequently placed in different clusters. These differences can be leveraged in a "multiple pronunciation" acoustic model to aid in detecting fluent vs. disfluent prosodic boundaries, and potentially for improving recognition accuracy.