ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis

Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi

This paper proposes a novel transform mapping technique based on shared decision tree context clustering (STC) for HMM-based cross-lingual speech synthesis. In the conventional cross-lingual speaker adaptation based on state mapping, the adaptation performance is not always satisfactory when there are mismatches of languages and speakers between the average voice models of input and output languages. In the proposed technique, we alleviate the effect of the mismatches on the transform mapping by introducing a language-independent decision tree constructed by STC, and represent the average voice models using language-independent and dependent tree structures. We also use a bilingual speech corpus for keeping speaker characteristics between the average voice models of different languages. The experimental results show that the proposed technique decreases both spectral and prosodic distortions between original and generated parameter trajectories and significantly improves the naturalness of synthetic speech while keeping the speaker similarity compared to the state mapping.


doi: 10.21437/Interspeech.2014-178

Cite as: Nagahama, D., Nose, T., Koriyama, T., Kobayashi, T. (2014) Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis. Proc. Interspeech 2014, 770-774, doi: 10.21437/Interspeech.2014-178

@inproceedings{nagahama14_interspeech,
  author={Daiki Nagahama and Takashi Nose and Tomoki Koriyama and Takao Kobayashi},
  title={{Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={770--774},
  doi={10.21437/Interspeech.2014-178},
  issn={2308-457X}
}