With the rapid increase in the number of music documents being registered for copyright every year, detection of music plagiarism has become critical and parallelly more difficult. While most current research focuses on monophonic music, we attempt to design a method for finding similarities between polyphonic music documents, which can subsequently be used for the automatic identification of music plagiarism. For extracting features for each song, we use the non-negative matrix factorization (NMF) method, which has recently been effectively used in signal separation. We also find a number of features directly based on the music content, which when combined with the NMF, lead to accurate results. A modified version of the dynamic time warping algorithm is used for comparing the features obtained between two songs. A database of almost 3000 songs is created to train a random forest classifier, and the method is tested on successful plagiarism suits obtained from the Music Copyright Infringement Resource of the UCLA School of Law. Preliminary results show an accuracy of 78.4%.
Index Terms: music plagiarism detection, polyphonic music, similarity measures, compositional models, monaural signal separation