State of the art speech recognition systems use data-intensive context-dependent
phonemes as acoustic units. However, these approaches do not translate
well to low resourced languages where large amounts of training data
is not available. For such languages, automatic discovery of acoustic
units is critical. In this paper, we demonstrate the application of
nonparametric Bayesian models to acoustic unit discovery. We show that
the discovered units are correlated with phonemes and therefore are
linguistically meaningful.
We also present a
spoken term detection (STD) by example query algorithm based on these
automatically learned units. We show that our proposed system produces
a P@N of 61.2% and an EER of 13.95% on the TIMIT dataset. The improvement
in the EER is 5% while P@N is only slightly lower than the best reported
system in the literature.