Perceptual training with multitalker babble can benefit first language listeners; however, it is unclear whether such training is beneficial for second language (L2) listeners and whether there is an optimal number of talkers for creating babble. Since perception experiments with humans are complex and time-consuming and neural models are inspired by the human brain, we explore the use of neural models and study how well their performance aligns with the results for human listeners. In this study, we first investigated how babble produced by 2 and 6 talkers affected the perceptual learning of English vowels by L2 listeners. We then fine-tuned Automatic Speech Recognition (ASR) models using the same babble datasets. The results showed that babble regardless of the number of talkers benefited listeners in speech-shaped noise, and the Wav2Vec2.0 model also improved the accuracy with babble training and exhibited trends more similarly to humans than the TDNN model.