ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Methods for efficient semi-automatic pronunciation dictionary bootstrapping

Tim Schlippe, Matthias Merz, Tanja Schultz

In this paper we propose efficient methods which contribute to a rapid and economic semi-automatic pronunciation dictionary development and evaluate them on English, German, Spanish, Vietnamese, Swahili, and Haitian Creole. First we determine optimal strategies for the word selection and the period for the grapheme-to-phoneme model retraining. In addition to the traditional concatenation of single phonemes most commonly associated with each grapheme, we show that web-derived pronunciations and cross-lingual grapheme-to-phoneme models can help to reduce the initial editing effort. Furthermore we show that our phoneme-level combination of the output of multiple grapheme-to-phoneme converters reduces the editing effort more than the best single converters. Totally, we report on average 15% relative editing effort reduction with our phoneme-level combination compared to conventional methods. An additional reduction of 6% relative was possible by including initial pronunciations from Wiktionary for English, German, and Spanish.


doi: 10.21437/Interspeech.2014-595

Cite as: Schlippe, T., Merz, M., Schultz, T. (2014) Methods for efficient semi-automatic pronunciation dictionary bootstrapping. Proc. Interspeech 2014, 2867-2871, doi: 10.21437/Interspeech.2014-595

@inproceedings{schlippe14_interspeech,
  author={Tim Schlippe and Matthias Merz and Tanja Schultz},
  title={{Methods for efficient semi-automatic pronunciation dictionary bootstrapping}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2867--2871},
  doi={10.21437/Interspeech.2014-595},
  issn={2308-457X}
}