In a research world where many human-hours are spent labelling, segmenting, checking, and rechecking various levels of linguistic information, it is obvious that automatic analysis can lower the costs (in time as well as funding) of linguistic annotation. More importantly, automatic speech analysis coupled with automatic speech generation allows human-computer interaction to advance towards spoken dialogue. Automatic intonation analysis can aid this advance in both the speaker and hearer roles of computational dialogue. Real-time intonation analysis can enable the use of intonational cues in speech recognition and understanding tasks. Auto-analysis of developmental speech databases allows researchers to easily expand the range of data which they model for intonation generation.
This paper presents a series of experiments which test the use of acoustic data in the automatic detection of Tilt intonation events. A set of speaker-dependent HMMs is used to detect accents, boundaries, connections and silences. A base result is obtained, following Taylor [8], by training the models using fundamental frequency and RMS energy. These base figures are then compared to a number of experiments which augment the F0 and energy data with cepstral coefficient data. In all cases, both the first and second derivative of each feature are included. The best results show a relative error reduction of 12% over the baseline.