Training deep belief networks (DBNs) is normally done with large data sets. Our goal is to predict traces of the surface of the tongue in ultrasound images of hu- man speech. Hand-tracing is labor-intensive; the dataset is highly imbalanced since many images are extremely similar. We propose a bootstrapping method which han- dles this imbalance by iteratively selecting a small subset of images to be handtraced (thereby reducing human la- bor time), then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection, thereby achieving over a two-fold reduction in human time required for tracing with human-level accuracy.
Index Terms: deep belief networks, ultrasound imaging, tongue imaging, speech processing, bootstrapping, class imbalance problem