Due to the high acoustic variability of child speech and the lack of publicly available datasets, acoustic modeling for child speech is challenging. In this work, we address these challenges by leveraging the large amounts of resources for adult speech (well-trained acoustic models and transcribed speech dataset) and proposing a joint acoustic feature and model adaptation framework to minimize acoustic mismatch between adult and child speech. Empirical results on three tasks of speech recog- nition, pronunciation assessment, and fluency assessment show that our proposed approach consistently outperforms competi- tive baselines, achieving up to 31.18% phone error reduction on speech recognition and around 7% gains on speech evaluation tasks.