ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages

B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan

A polyglot speech synthesizer, synthesizes speech for any given monolingual or multilingual text, in a single speaker's voice. In this regard, a polyglot speech corpus is required. It is difficult to find a speaker proficient in multiple languages. Therefore, in the current work, by exploiting the acoustic similarity of phonemes across Indian languages, a polyglot speech corpus is obtained for four Indian languages and Indian English, using GMM-based cross-lingual voice conversion. The optimum target speaker and GMM topology is chosen based on the performance of a speaker identification system. It is observed that, the language that shares the most number of phonemes with the other languages, serves as the best target. A polyglot speech corpus derived in this target speaker's voice, is further used to develop an HMM-based polyglot speech synthesizer. The performance of this synthesizer is evaluated in terms of speaker identity using ABX listening test, quality using mean opinion score (MOS) and speaker switching using subjective listening test.


doi: 10.21437/Interspeech.2014-179

Cite as: Ramani, B., Jeeva, M.P.A., Vijayalakshmi, P., Nagarajan, T. (2014) Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages. Proc. Interspeech 2014, 775-779, doi: 10.21437/Interspeech.2014-179

@inproceedings{ramani14_interspeech,
  author={B. Ramani and M. P. Actlin Jeeva and P. Vijayalakshmi and T. Nagarajan},
  title={{Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={775--779},
  doi={10.21437/Interspeech.2014-179},
  issn={2308-457X}
}