In this paper a framework will be introduced which is specifically directed towards the quality assessment of synthetic speech by means of speech and speaker recognition techniques. The basic idea of this approach is that a quality judgement is closely related to the speech recognition process which is in principle a pattern recognition task. That means that auditory perception of speech can be described as a comparison of an unknown speech pattern with the listener's internal 'reference data base' of known speech patterns. The important question is which of the various features in the speech patterns are responsible for, e.g., speaker variability (and thus the distinction of synthetic and natural speech) and for speech quality in its global sense. The framework briefly can be described as follows: A perception-based analysis of speech samples of many speakers build a reference feature space. Samples of speech synthesizers are classified with regard to this reference set and distance measures are computed. The distance measures will be compared with subjective ratings which were obtained by listening tests. Data tell to what extent the distance measure is able to predict subjective ratings. It is obvious that the question is quite similar to issues in the research area of speech and speaker recognition, but it comprises a different way of looking at the things.
Keywords: Speech Quality Assessment, Speech Synthesis, Speech Recognition