There is emerging evidence that extended high frequencies (EHFs; >8 kHz) improve speech perception in noise. Yet, the mechanisms underlying this benefit remain unclear. We investigated whether EHFs contribute to phoneme recognition using an automatic speech recognition (ASR) model. A neural network model was trained to decode phonemes from cochleagrams of broadband speech and speech low-pass filtered at 8 or 6 kHz in quiet and masked conditions with varying target-to-masker ratios (TMRs) and target-masker spatial separations. Compared with filtered speech, broadband speech improved phoneme recognition accuracy in masked conditions, particularly at lower TMRs, but showed no benefit in quiet. Removing EHFs increased the probability of the model omitting a phoneme more for consonants than vowels. The findings suggest that the EHF benefit in adverse conditions may partly arise from enhanced phoneme processing, highlighting the potential of improving audiometry and ASR by including EHFs.