An experiment is described for the performance evaluation of: 1) specifically defined speech units against simple "ideal" diphones for synthesizing vowel to vowel coarticulations and sonorant consonant clusters; 2) "allodiphones" for synthesizing stressed mid vowel allophones /'E/ and /'O/.
By concatenation of properly segmented speech units, 20 test words were synthesized and grouped in 23 pairs, to be evaluated by subjective tests according to a three level paired comparison method. Both "trained" and "untrained" listeners could assign preference to one of the two stimuli of each pair or give no preference.
Results show that in particular contexts triphones provide better fitting of complex coarticulations, while allophones of mid vowels and /r/ require proper "allodiphones", in order to get Italian text-to-speech synthesis of good acoustic quality.