Although electronic speech synthesis by now has a tradition of several decades, there is still no agreement on the most preferable structure for a speech synthesizer. In this paper we will compare several structures that have been used by workers in the field. As these all appear to have some drawbacks, we will propose an alternative structure that should solve at least some of the problems.
The single most important axiom underlying our work is the opinion that the development of synthesis rules will be made much easier and less time consuming if optimal use can be made of existing phonetic knowledge. This knowledge happens to be formulated either in terms of articulatory postures and movements or in terms of formant patterns. Taking recourse to the acoustic theory of speech production [1,2] it is not too difficult to translate articulatory data into formant patterns. The transformation of formant patterns into articulatory configurations is more difficult, also, the result is not necessarily unique [3]. This is one reason why articulatory synthesis has received much less attention than formant synthesis or terminal analog synthesis. In this paper the discussion will be restricted to terminal analog synthesis, and more specifically, to formant synthesis. The use of linear prediction parameters like reflexion coefficients or Log Area Ratios is not considered because, regardless of their sugestive names, the relation of these parameters to actual vocal tract configurations is, at best, disputable.