The use of automatic speech recognition (ASR) has been increasing to promote inclusion and accessibility. Nonetheless, prior work on ASR finds performance gaps conditioned by specific gender and racial groups, revealing systematic biases in modern ASR systems. However, work has focused on native varieties of English, glossing over its impact on a wider range of ASR users, namely second language speakers of English. The present work compares the performance of the transcription system Otter, on 24 varieties of English, 21 of them are non-native varieties. We compare the word and phone error rate (WER/PER) of accent varieties that are claimed to be supported by Otter and those that are unsupported. Results show that English varieties that are supported have lower WERs compared to that of unsupported varieties. However, there are still systematic differences in performance conditioned by linguistic structure in both supported and unsupported Englishes. Specifically, Otter performs better on English varieties from non-tonal first language speakers. We conclude that while inclusion of more varieties of English in the training data set for ASR may promote inclusivity, there may still be biases inherent to the linguistic structure that should not be overlooked.