ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Training deep nets with imbalanced and unlabeled data

Jeff Berry, Ian Fasel, Luciano Fadiga, Diana Archangeli

Training deep belief networks (DBNs) is normally done with large data sets. Our goal is to predict traces of the surface of the tongue in ultrasound images of hu- man speech. Hand-tracing is labor-intensive; the dataset is highly imbalanced since many images are extremely similar. We propose a bootstrapping method which han- dles this imbalance by iteratively selecting a small subset of images to be handtraced (thereby reducing human la- bor time), then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection, thereby achieving over a two-fold reduction in human time required for tracing with human-level accuracy.

Index Terms: deep belief networks, ultrasound imaging, tongue imaging, speech processing, bootstrapping, class imbalance problem