Cued shadowing is a psycholinguistic task that captures the response speed and accuracy of participants' vocal repetition of target words. Due to its simplicity, the paradigm is widely used as a naturalistic measure of speech processing. While the COVID-19 pandemic has driven the adaptation of many lab-based experiments to internet-based data collection, cued shadowing is not straightforward to adapt due to various challenges, including the precision of timing, efficient extraction of response latencies, and control over data quality. The current paper presents solutions to these challenges and describes the methodology for conducting cued shadowing of audio-video stimuli online with children and adults. The performance of two (semi-)automatic speech onset detection tools and two experimental designs are evaluated. The technique developed enables millisecond precision in response time measurement and has great potential for the inclusion of minority and hard-to-reach communities in future speech perception and production research.