ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Two-class signal segmentation for speech/music detection in audio tracks

Mouhamadou Seck, Frédéric Bimbot, Didier Zugaj, Bernard Delyon

We present a technique for the segmention of a sound track into two classes of segments. Each frame of signal is preprocessed by extracting cepstral coefficients and their first order derivatives. For each class, the distri-bution of the frame parameter vectors is modeled by a Gaussian Mixture Model (GMM). GMM order is se-lected using two criteria : the Minimum Description Length (MDL) criterion and the Aka¨ike Information Cri-terion (AIC). Frame score is based on a weighted log-likelihood ratio in a window around the frame. De-cision for each frame is taken by comparing its score to a threshold. Experiments are presented on speech / music segmentation in audio tracks. In these experi-ments, the MDL criterion leads to a reasonable GMMor-der. Using the MDL criterion for GMM order selection, frame classification error rate is around 20%. However, using GMMs with much lower orders, only decreases marginally performances.