ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

On the use of machine learning methods for speech and voicing classification

Philip Harding, Ben Milner

This work examines the effectiveness of machine learning (ML) classifiers on the problems of voice activity detection and voicing classification. A wide range of ML classifiers are considered and include parametric, probabilistic and non-probabilistic, artificial neural networks and regression. Evaluations are carried out in both stationary and non-stationary noise types at signal-to-noise ratios down to 0dB. In comparison to conventional methods the ML methods are found to be significantly more robust with multilayer perceptrons, Gaussian mixture models and Rotation Forest giving consistently best performance.

Index Terms: voice activity detection, mfcc, machine learning