ISCA Archive Eurospeech 1995
ISCA Archive Eurospeech 1995

Confusions among Italian consonants in good and in telephone conditions: differences between text-to-speech systems and natural speech with noise

Cristina Delogu, Andrea Paoloni, Paola Ridolfi

Natural and synthetic voice differ from various points of view, as shown by the results of many experiments. This difference can be due to possible differences in the acoustic-phonetic structure of the two signals. In order to investigate this hypothesis, we run a consonant confusion test for 19 Italian consonants produced by a natural voice with noise (3 S/N ratios) and 6 TTS systems presented through good and telephone channels. The results showed that the distributions of consonant confutions for natural and synthetic speech (both formant-based and diphone-based synthesis) were often quite different, suggesting some contraddiction in the acoustic cues and in the coarticulation model of the synthetic signals.