Native speakers of non-tonal languages, such as American English, frequently have difficulty accurately producing the tones of Mandarin Chinese. This paper describes a corpus of Mandarin Chinese spoken by non-native speakers and annotated for tone quality using a simple good/bad system. We examine inter-rater correlation of the annotations and highlight the differences in feature distribution between native, good non-native, and bad non-native tone productions. We find that the features of tones judged by a simple majority to be bad are significantly different from features from tones judged to be good, and tones produced by native speakers.