ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Speech recognition without a lexicon — bridging the gap between graphemic and phonetic systems

David Harwath, James R. Glass

Modern speech recognizers rely on three core components: an acoustic model, a language model, and a pronunciation lexicon. In order to expand speech recognition capability to low-resource languages and domains, techniques to peel away the expert knowledge required to craft these three components have been growing in popularity. In this paper, we present a method for automatically learning a weighted pronunciation lexicon in a data-driven fashion without assuming the existence of any phonetic lexicon whatsoever. Given an initial grapheme acoustic model, our method utilizes a novel technique for semi-constrained acoustic unit decoding, which is used to help train a letter to sound (L2S) model. The L2S model is then used in conjunction with a Pronunciation Mixture Model (PMM) to infer a pronunciation lexicon. We evaluate our method on English as well as Lao and Haitian, two low-resource languages featured in the IARPA Babel program.


doi: 10.21437/Interspeech.2014-568

Cite as: Harwath, D., Glass, J.R. (2014) Speech recognition without a lexicon — bridging the gap between graphemic and phonetic systems. Proc. Interspeech 2014, 2655-2659, doi: 10.21437/Interspeech.2014-568

@inproceedings{harwath14b_interspeech,
  author={David Harwath and James R. Glass},
  title={{Speech recognition without a lexicon — bridging the gap between graphemic and phonetic systems}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2655--2659},
  doi={10.21437/Interspeech.2014-568},
  issn={2308-457X}
}