ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Simultaneous optimization of multiple tree structures for factor analyzed HMM-based speech synthesis

Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Some speech synthesis approaches are based on an assumption that voice characteristics, e.g., speaker, speaking style, and emotion, are represented in a low-dimensional subspace. In these approaches, the model structures of the basis vectors which span the subspace are typically constructed with decision trees, and are important to synthesize high-quality speech. However, since it is difficult to evaluate all the candidates of the model structures, some strong constraints are usually applied in the model construction to reduce the huge computational complexity. To overcome this problem, this paper presents a new technique that simultaneously construct the model structures with multiple tree structures without the constraints. The proposed technique enables to find the more optimal model structures because the more complex model structure candidates can be evaluated by using some computational complexity reduction algorithms. Experimental results show that the proposed method improves the naturalness of the synthesized speech from the conventional one.