ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Expanded examinations of a low frequency modulation feature for speech/music discrimination

Stefan Karnebäck

A low frequency modulation feature, LFMAD, was examined under several conditions with regard to its robustness on speech/music discrimination. The feature was tested on LF components from 2 Hz to 27 Hz and with different analysis window sizes. This feature performs best when using an analysis window size containing only one period of the LF component to be used. When the music contained much vocals, the error rate increased compared with only instrumental music in the speech/music discrimination task. This effect was found in LFMAD as well as in the MFCC feature, which was used for comparison. Tests were also carried out with signals in additive noise from 30 dB to 0 dB SNR. LFMAD performed better than MFCC in these tests. The error rate was higher for speech signals. There was a bias towards classifying data as music when the test conditions diverged from those of the training condition. This effect is less obvious for LFMAD than for MFCC. The best results in this study were obtained when combining the two features LFMAD and MFCC into a mixed feature. This seems to be a more robust feature regarding the speech/music discrimination ability and could be recommended when scanning data bases of unknown quality for speech events.