In the scope of our oral reading exercise for 5-8-year-old children, models need to be able to precisely detect and diagnose reading mistakes, which remains a considerable challenge even for state-of-the-art ASR systems. In this paper, we compare hybrid and end-to-end acoustic models trained for phoneme recognition on young learners' speech. We evaluate them not only with phoneme error rates but through detailed phoneme-level misread detection and diagnostic metrics. We show that a traditional TDNNF-HMM model, despite a high PER, is the best at detecting reading mistakes (F1-score 72.6%), but at the cost of low precision (73.8%) and specificity (74.7%), which is pedagogically critical. A recent Transformer+CTC model, to which we applied our synthetic reading mistakes augmentation method, obtains the highest precision (81.8%) and specificity (86.3%), as well as the highest correct diagnosis rate (70.7%), showing it is the best fit for our application.