Conversational agents are expected to play the role of listening to narratives instead of humans. To be recognized as narrative listeners, these agents are required to generate responses indicating attentive listening at appropriate times. However, narrative and response data are not always well accumulated to develop statistical models. One solution to this issue is to utilize other data besides response data. In this study, we utilize text data to train a response timing detection model inspired by the relationship between punctuation marks and attentive listening responses. Specifically, the model was trained on a punctuation insertion task using text data before being trained on the response data. A response timing detection experiment was conducted to evaluate the effect of utilizing text data in terms of the amount of response data. The results showed that the utilization of text data enhanced the performance of the response timing detection, especially with limited data.