Previously, we proposed two voice-to-phoneme conversion algorithms for speaker-independent voice-tag creation specifically targeted at applications on embedded platforms, an environment sensitive to CPU and memory resource consumption [1]. These two algorithms (batch mode and sequential) were applied in a same-language context, i.e., both acoustic model training and voice-tag creation and application were performed on the same language.
In this paper, we investigate the cross-language application of these two voice-to-phoneme conversion algorithms, where the acoustic models are trained on a particular source language while the voicetags are created and applied on a different target language. Here, both algorithms create phonetic representations of a voice-tag of a target language based on the speaker-independent acoustic models of a distinct source language. Our experiments show that recognition performances of these voice-tags vary depending on the source-target language pair, with the variation reflecting the predicted phonological similarity between the source and target languages. Among the most similar languages, performance nears that of the native-trained models and surpasses the native reference baseline.