ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Data-driven posterior features for low resource speech recognition applications

Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky

In low resource settings, with very few hours of training data,state-of-the-art speech recognition systems that require large amounts of task specific training data perform very poorly. We address this issue by building data-driven speech recognition front-ends on significant amounts of task independent data from different languages and genres collected in similar acoustic conditions as data provided in the low resource scenario. We show that features derived from these trained front-ends perform significantly better and can alleviate the effect of reduced task specific training data in low resource settings. The proposed features provide a absolute improvement of about 12% (18% relative) in an low-resource LVCSR setting with only one hour of training data. We also demonstrate the usefulness of these features for zero-resource speech applications like spoken term discovery, which operate without any transcribed speech to train systems. The proposed features provide significant gains over conventional acoustic features on various information retrieval metrics for this task.

Index Terms: Low-resource speech recognition, spoken term discovery, posterior features.