The TELVID corpus is a new multi-language, multi-modal resource for speaker recognition, comprising multiple conversational telephone speech and video recordings from each of 300 multilingual speakers. Consented subjects contributed recordings in a wide variety of recording conditions, with a minimum of 11 calls, 10 videos and one selfie image per person. Every speaker made recordings in Tunisian Arabic, North African French and/or English, along with two “freestyle” recordings that utilize the speaker’s choice of any language, dialect or mix of varieties. Recordings were audited to verify quality and speaker identity and portions of the data were selected for test data for the NIST 2024 Speaker Recognition Evaluation. We developed audio and visual baseline systems and measured baseline system performance. The TELVID corpus will be published in the Linguistic Data Consortium Catalog, making it broadly available for language-related research, education and technology development.