Prosody is an inherent supra-segmental feature of human’s speech that is being employed to express e.g. attitude, emotion, intent and attention. Pitch is the most important feature among the prosodic information. For Mandarin Chinese speech, the pitch information is even more crucial because Mandarin is a tonal language in which the tone of each syllable is described by its pitch contour. In this paper, the concept of syllablebased eigenpitch is introduced and investigated using principal component analysis (PCA). The eigenpitch and the related eigen features are analyzed, and it is shown that the tonal patterns are preserved in the eigenpitch representation. Furthermore, we show that the dimension of pitch in the eigen space can be reduced while minimizing the energy loss of the original pitch contour. Finally, we briefly discuss the quantization properties of the eigenpitch representation. We also present experimental results obtained using a Mandarin speech database. They are in line with the theoretical reasoning and further prove the usefulness of the proposed pitch modeling technique. 1 Introduction The term prosody refers to certain properties of a speech signal that are related to audible changes in pitch, loudness and duration. Among these features, pitch usually plays the most important role. Physically, the pitch of an utterance depends on the rate of vibration of the vocal cords; the higher the rate of vibration, the higher the resulting pitch becomes. Another concept closely related to pitch is tone that is used to describe pitch variations inside short stretches of syllables. In tonal languages, these relative pitch differences are used either to differentiate between word meanings or to convey grammatical distinctions. Many of the languages of SouthEast Asia and Africa are tonal languages. Mandarin Chinese is probably the most widely studied tonal language in which each stressed syllable has a significant contrastive pitch that is an integral part of the syllable. It has four basic tones: high level, high rising, dipping/falling and high falling. They are used to distinguish otherwise homophonous words as shown in Table 1. Word Intonation Meaning ma [--] mother ma [/] numbness ma [\/] horse ma [\] curse Table 1. Examples of different tones in Mandarin Chinese. The most commonly used representation of tonal pitch contours as numbers is shown in Table 2. It consists of five pitch levels, rather like the use of staves in music scores. They are labeled from the bottom upwards from 1 to 5. The tonal patterns are captured using the reference pitch numbers by observing the start, the middle and the end points of the pitch contour [7]. Contour Type Pattern Feature 5 4 1 4 3 2 2 3 1 Tone 1 Tone 2 Tone 3 Tone 4 5-5 3-5 2-1-4 5-1 H-H (High) L-H (Rising) L-L (Low) H-L (Falling) Table 2. Tonal patterns and phonological notations of four citation tones in Mandarin Chinese. Obviously pitch information plays a crucial role in speech synthesis systems, especially for tonal languages [3][8]. Since the pitch contour conveys information about word meaning distinction, prosodic phrase and word boundaries, it has been found in [5] that human beings use the pitch contour information to enhance the speech recognition performance. Various techniques have also been proposed to improve the noise robustness of speech recognition systems by using the pitch information [5]. Due to all of these reasons, pitch modeling is one of the key issues that must be addressed when dealing with tonal languages. The most popular pitch modeling approaches are mainly using the concept of separating the pitch contour into a global trend and local variation. Two examples following this approach are the superpositional modeling technique [2] and the two-stage modeling technique [1]. In [6], the mean and the shape of the syllable pitch contours are taken as two basic modeling units by using a 3rd order orthogonal polynomial expansion. Since the syllable pitch contour patterns vary dramatically from their canonical form, a reasonable assumption is that some datadriven approach could preserve more precise and more relevant information compared to pure artificial fitting. In this paper, we propose a data-driven pitch modeling approach based on the concept of eigenpitch and study its properties to verify the above assumption. In addition, we provide results related to tonal classification and pitch compression using the proposed modeling approach. The remainder of the paper is organized as follows. We first describe the process of eigenpitch extraction and some of the basic properties of the eigenpitch representation. Then, the performance of the proposed modeling approach in the tonal 0-7803-8678-7/04/$20.00 ©2004 IEEE 89 ISCSLP 2004