This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Grapheme-to-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment algorithm leverages the WFST framework and introduces several simple structural constraints which yield a small but consistent improvement in Word Accuracy (WA) on a selection of standard baselines. The RNNLM rescoring further extends these gains and achieves state-of-the-art performance on four standard G2P datasets. The system is also shown to be significantly faster than existing solutions. Finally, the complete WFST-based G2P framework is provided as an open-source toolkit.
Index Terms: G2P, Alignment, RNNLM, WFST