In Swedish, the fundamental frequency contour (F0 contour) is known to be the main acoustic feature for word accent and sentence intonation. A model, based on an extension of the model for Japanese, is used for the generation process of F0 contours of Swedish. As the input to this model, two kinds of command are assumed: the phrase commands which are positive impulses except at the end of an utterance, and accent commands which are stepwise functions of both polarity. Analysis-by-Synthesis of F0 contours of both isolated words and sentences, uttered by two native speakers from the Stockholm region, indicated that the model can always generate very close approximations to observed Fo contours, and that the extracted parameters are systematically related to the underlying lexical word accent, syntactic structure, and focus. Furthermore, the model is introduced into a framework of text-to-speech conversion for Swedish and an outline is given for the derivation of Fo model parameters.
Keywords: intonation, prosody, text-to-speech conversion, speech synthesis, fundamental frequency contour, analysis-by-synthesis.