In this paper, we present an improved vocal tract length perturbation (VTLP) algorithm as a data augmentation technique. VTLP is usually accomplished by adjusting the center frequencies of mel filterbank in [1]. Compared to the conventional approach, we re-synthesize waveforms from the frequency-warped spectra using overlap and addition (OLA). This approach had two advantages: First, we can apply an “acoustic simulator” [2, 3] after performing the VTLP-based frequency warping. Second, we may use a different window length for frequency warping from that used in feature processing. We observe that the best performance was obtained when the warping coefficient distribution is between 0.8 and 1.2, and the window length is 50 ms. We obtained 3.66% WER and 12.39% WER on the Librispeech test-clean and test-other using an attention-based end-to-end speech recognition system without using any Language Models (LMs). Using the shallow-fusion technique with a Transformer LM, we achieved 2.44% WER and 8.29% WER on the Librispeech test-clean and test-other sets. To the best of our knowledge, the 2.44% WER on the test-clean is the best result ever reported on this test set.