In this paper, we propose a note-based query by humming (QBH) system
with Hidden Markov Model (HMM) and Convolutional Neural Network (CNN)
since note-based systems are much more efficient than the traditional
frame-based systems. A note-based QBH system has two main components:
humming transcription and candidate melody retrieval.
For humming transcription,
we are the first to use a hybrid model using HMM and CNN. We use CNN
for its ability to learn the features directly from raw audio data
and for being able to model the locality and variability often present
in a note and we use HMM for handling the variability across the time-axis.
For candidate melody retrieval, we use locality sensitive hashing
to narrow down the candidates for retrieval and dynamic time warping
and earth mover’s distance for the final ranking of the selected
candidates.
We show that our HMM-CNN humming transcription system outperforms
other state of the art humming transcription systems by ~2% using
the transcription evaluation framework by Molina et. al and our overall
query by humming system has a Mean Reciprocal Rank of 0.92 using the
standard MIREX dataset, which is higher than other state of the art
note-based query by humming systems.