The quality of the synthetic speech produced by concatenation-based PSOLA text-to-speech systems depends on two main factors : the richness and adequacy of the prosodic contours which are generated and the choice of the speech units to be concatenated. This paper addresses the second issue : the building, segmentation and evaluation of high-quality speech unit inventories in several languages. The standard CNET speech unit inventories for French, Spanish and German contain diphones and a supplementary set of triphones and quadriphones which include highly coarticulated sounds. These three inventories were segmented in two ways : manually by phoneticians and automatically an HMM-based segmentation procedure. For the three languages, 85% of the segmentation marks found by the automatic method were correct when compared to those found by the manual method. A global quality test comparing the two versions shows that, except for German, subjective listener preference ratings are not significantly different.