Today’s end-to-end (E2E) ASR models achieve strong performance when applied to adult speech, but deteriorate on children’s speech. Most E2E ASR models are pre-trained on adult speech, which introduces an age mismatch that can be addressed by finetuning on child data. However, due to limited availability of child datasets, fine-tuning on children’s speech may introduce new domain shifts such as speaking style mismatch. In this work, we explore mixed fine-tuning on partially matched data, namely read adult speech and spontaneous children’s speech, to improve the performance of E2E ASR on read children’s speech. We isolate the individual impact of age mismatch and speaking style mismatch and investigate the use of childrenization of read adult speech. Our proposed method reduces the WER by up to 5% absolute (21% relative) compared to the pre-trained E2E ASR and by roughly 3% absolute (15% relative) compared to individual fine-tuning on partially matched datasets.