ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Local-feature-map integration using convolutional neural networks for music genre classification

Toru Nakashika, Christophe Garcia, Tetsuya Takiguchi

A map-based approach, which treats 2-dimensional acoustic features using image analysis, has recently attracted attention in music genre classification. While this is successful at extracting local music-patterns compared with other frame-based methods, in most works the extracted features are not sufficient for music genre classification. In this paper, we focus on appropriate feature extraction and proper classification by integrating automatically learnt image feature. For the musical feature extraction, we build gray level co-occurrence matrix (GLCM) descriptors with different offsets from a short-term mel spectrogram. These feature maps are integratively classified using convolutional neural networks (ConvNets). In our experiments, we obtained a large improvement of more than 10 points in classification accuracy on the GTZAN database, compared with other ConvNets-based methods.

Index Terms: music genre classification, music information retrieval, music feature extraction, convolutional neural networks