Vocal attractiveness, as an important indicator of personal traits, was hardly explored from the perspective of foreign listeners. In this study, both English native speakers and Chinese university students were asked to evaluate synthetic English utterances with manipulated voice quality, formant dispersion, pitch shift and pitch range in same-sex and opposite-sex contexts. While some deviant features are shown in the results, English and Chinese subjects followed the principle of body size projection to varying degrees, with preferences for breathiness of female voice to signal a small body size, and narrower formant distribution of male voices to signal a large body size. Breathiness was also preferred for male voices to reduce implied aggressiveness by other body-size indicators. However, noteworthy differences between the two groups existed. Overall, Chinese subjects gave higher mean ratings to both genders and demonstrated weakened dimorphic characteristics of preferences compared to English subjects. Furthermore, for the same-sex voices, English women and Chinese men provided significantly lower ratings than their opposite gender within the same group. The cross-linguistic differences shown in these results could be due to various linguistic, cultural, psychological, and educational factors, which require further examination due to technical limitations on the synthesis of female voices.