This paper explores the continuous prediction of backchannel timing in conversational speech, with the aim to make turn-taking in human-robot interaction more natural. To assure real-time prediction, we present regression-based models based exclusively on acoustic features that can be extracted continuously from the user's speech. Comparing different machine learning models, we found lightGBM models to perform best with respect to accuracy (mean absolute error: approx. 130 ms) and efficiency, while meeting the real-time requirement. Our analysis of feature importances revealed that speaking duration, intensity and fundamental frequency are among the most important predictors of backchannel timing, when extracted in the window from 275-875 ms before a backchannel in the interlocutor's turn. Given the strong predictive performance of our models, this work provides a foundation for implementing more natural and responsive conversational agents.