CapCap is an output-agreement game that challenges players' listening and speaking skills. Players submit their transcriptions for short video segments against a countdown timer, in one of three pre-specified modes, to score points and support their team. Adding entertainment value, the game channels input toward captioning videos without monetary rewards. It deploys a novel human computation algorithm, which collects input from a crowd of non-experts, sequentially and in parallel, until a completion criterion is met. Rather than monetary incentive, CapCap uses motivational mechanisms like indirect feedback, mix of player skills, and community identification. Preliminary results from a field trial with mostly non-native English speakers improved the WER of English captions over ASR output.