ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

A context clustering technique for average voice model in HMM-based speech synthesis

Junichi Yamagishi, Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi

This paper describes a new technique for constructing a decision tree used for clustering average voice model, i.e., speaker independent speech units. In the technique, we first train speaker dependent models using multi-speaker speech database, and then construct a speaker independent decision tree for context clustering common to these speaker dependent models. When a node of the decision tree is split, only the context related questions which can split the node for all speaker dependent models is adopted. Consequently, all nodes of the decision tree have all speakersÂ’ training data. From the result of the paired comparison test, we show that the average voice model trained using the proposed technique can synthesize more natural sounding speech than the conventional average voice model.