When language learners are listening to L2 speech, they experience listening disfluencies or breakdowns not rarely. Although listening disfluencies are mental phenomena, previous studies showed that they can be measured acoustically by asking the learners to shadow the L2 speech, where inarticulate productions in shadowing are reasonably attributed to listening disfluencies. In this paper, we model the measured listening disfluencies by BLSTM and attempt to predict which words in new listening drills are difficult to perceive correctly. Taking some studies in psycholinguistics and applied linguistics into account, which revealed what kind of factors influence human perception of spoken words, speech and text features are extracted from listening drills and used for prediction. Experiments show that our model shows a better performance than other models previously proposed and that learners' factors are very effective for prediction because learners are developing through training.