Many young children prefer speech based interfaces over text, as they are relatively slow and error-prone with text input. However, children ASR can be challenging due to the lack of transcribed children speech corpora. In this paper, we investigate a voice conversion method based on WORLD vocoder to generate childlike speech for data augmentation. Since noise may lead to severe artifacts in converted speech, we also investigate using speech enhancement to improve the quality of converted speech. On a publicly available children speech corpus, we evaluated the performance of the proposed data augmentation method against existing data augmentation methods based on linear prediction coefficients. Our proposed data augmentation method substantially outperformed the prior work on children ASR. Additionally, on a task to classify the speaker, adult or child, data generated using our proposed method was shown to mimic real children better compared to the reference methods.