ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Improving WFST-based G2p conversion with alignment constraints and RNNLM n-best rescoring

Josef R. Novak, Paul R. Dixon, Nobuaki Minematsu, Keikichi Hirose, Chiori Hori, Hideki Kashioka

This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Grapheme-to-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment algorithm leverages the WFST framework and introduces several simple structural constraints which yield a small but consistent improvement in Word Accuracy (WA) on a selection of standard baselines. The RNNLM rescoring further extends these gains and achieves state-of-the-art performance on four standard G2P datasets. The system is also shown to be significantly faster than existing solutions. Finally, the complete WFST-based G2P framework is provided as an open-source toolkit.

Index Terms: G2P, Alignment, RNNLM, WFST