We compare automatic speech recognition (ASR) with human speech recognition (HSR) based on speech material that is traditionally used for diagnosing hearing deficits. Specifically, we quantify the human-machine gap with sentences in noise for different model sizes and two languages supported by Whisper, e.g., German and English. For German speech, we also determine the gap in different rooms and the presence of reverberation and localized noise. Results are put in context of audiological diagnosis using the speech reception threshold (SRT) (i.e., the SNR with 50% word recognition rate). We find that the largest ASR system is mildly hearing impaired when exposed to non-spatial, unpredictable US-English sentences and that using German speech degrades the SRT by 4.9 dB. In reverberant rooms, the gap reaches at least 5.9 dB. Based on the language bias, we estimate that same model can achieve better performance than normal-hearing listeners in anechoic conditions for US-English.