ISCA Archive Prosody 2001
ISCA Archive Prosody 2001

Acoustic classification of intonational events

Stefan Werner

In my current research, ways are sought to automatically classify intonational events on the basis of their acoustic realization, without employing any preconceptions about structures in the data. In particular, any (a priori) references to categories of intonation models, be they ToBI or other, are strictly avoided, and instead the data space is searched for clusters before any interpretations are applied.

For clustering, mainly self-organizing maps are used. They are applied to high-dimensional acoustic feature vectors, containing all available temporal and F0 information related to intonational events. In a preliminary step, the continous F0 curve has been transformed into a sequence of turning points (straightline stylization) which are then grouped in pairs, each one of whose is measured for the various acoustic parameters of the feature vector. The resulting clusters can be further examined with statistical methods in order to search for rules to construct them.

Our feature vectors can also be used as metarepresentations, e.g. in the comparison of different intonation models either directly or via empirical data. Design and evaluation of intonation models (potentially both for recognition and synthesis) should profit from this approach where no premature phonological reasoning can ‘contaminate’ the data and comparisons of linguistically incompatible models become possible.