We introduce an open-source Python library, VCTUBE, which can automatically generate <audio, text> pair of speech data from a given Youtube URL. We believe VCTUBE is useful for collecting, processing, and annotating speech data easily toward developing speech synthesis systems.