ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Automatic pitch accent classification through image classification

Na Hu, Hugo Schnack, Amalia Arvaniti

The classification of pitch accents has posed significant challenges in automatic intonation labelling. Previous research primarily adopted feature-based approaches, predicting pitch accents using a finite set of features including acoustic features (F0, duration, intensity) and lexical features. In this study, we explored a novel approach, classifying pitch accents as images represented in pixels. To evaluate this method’s effectiveness, we used a relatively simple classification task involving only two types of pitch accents (H* and L+H*). The training of a basic neural network model for classifying images of these two types of accents (N= 2,025) yielded an average accuracy of 93.5% across 10 runs on the test set, showcasing the potential effectiveness of this new approach.