ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Robust speech / music classification in audio documents

Julien Pinquier, Jean-Luc Rouas, Régine André-Obrecht

This paper deals with a novel approach to speech / music segmentation. Three original features, entropy modulation, stationary segment duration and number of segments are extracted. They are merged with the classical (4) Hz modulation energy. The relevance of these features is studied in a first experiment based on a development corpus composed of collected samples of speech and music. Another corpus is employed to verify the robustness of the algorithm. This experiment is made on a TV movie soundtrack and shows performances reaching a correct identification rate of 90%.