Current applications of Automatic Speech Recognition (ASR) based technology for second language learning often require comparisons of native and non-native speech for evaluation and feedback purposes, which are generally based on limited sets of features that might not be the most optimal ones to characterize native as opposed to non-native speech. In the present study, we conducted a systematic comparison based on a large number of standardized acoustic and temporal features. The main aim was to gain insights into which features are most distinguishing. In turn, this knowledge can be employed to develop classifiers that are more suitable to evaluate non-native speech and to provide a solid basis for delivering feedback aimed at improving speech production. The findings indicate that most of the investigated features are significant and the temporal features are also distinctive. We discuss these results in relation to previous research and outline avenues for future investigations.