This study conducted a comprehensive analysis of features using a validated audiometry corpus comprising 450 Mandarin Chinese disyllabic words across five emotional states: ``Angry," ``Sad," ``Happy," ``Fearful," and ``Neutral," produced by both male and female speakers. Employing machine-learning tools, the research identified and elucidated crucial acoustic-prosodic features for emotional vocalization. Results revealed several key points: First, the models showed that fear was acoustically the most recognizable emotion, while joy presented most difficulties. Second, in the identification of Mandarin emotional prosody, the spectrum characteristics like formant energy ratios were of primary significance, followed by those F0-related parameters such as the 20th and 80th percentiles of F0. Third, data of formant energy ratios mainly indicated that fearful voices were more turbulent, and those of F0-related features suggested a general increase in pitch for emotional speech. Moreover, considerable cross-speaker variations in affective vocalization strategies were observed, reflected in distinct feature patterns that our speakers exploited for their emotional expressions. Despite the considerable audio samples gathered from each speaker, the current corpus remains limited by its two-speaker scale. Nonetheless, ongoing efforts involve expanding the corpus with additional speakers. The scalability and replicability of the paradigm can facilitate seamless transplantation for future investigations.