ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Automatically clustering similar units for unit selection in speech synthesis

Alan W. Black, Paul Taylor

This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through the candidate units based on their distance from the cluster center and an acoustically based join cost. Details of the method and justification are presented. The results of experiments using two different databases are given, optimising various parameters within the system. Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones. The method is implemented within a full text-to-speech system offering efficient natural sounding speech synthesis.


doi: 10.21437/Eurospeech.1997-219

Cite as: Black, A.W., Taylor, P. (1997) Automatically clustering similar units for unit selection in speech synthesis. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 601-604, doi: 10.21437/Eurospeech.1997-219

@inproceedings{black97_eurospeech,
  author={Alan W. Black and Paul Taylor},
  title={{Automatically clustering similar units for unit selection in speech synthesis}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={601--604},
  doi={10.21437/Eurospeech.1997-219},
  issn={1018-4074}
}