Previous studies on early Alzheimer’s disease (AD) detection using speech have been limited by small sample sizes, scarce prodromal phase recordings, and minimal longitudinal data. Many also rely on standardized tasks that do not reflect natural language use. To address these gaps, we introduce ADCeleb, a longitudinal speech dataset comprising publicly available recordings from individuals who later disclosed an AD diagnosis. It includes samples from 40 individuals with prodromal AD and 40 cognitively normal controls (CNs), matched by age and sex, spanning 1 to 10 years before diagnosis. Classification experiments using multimodal models integrating speech and text achieved 0.72 accuracy in distinguishing AD from CNs in the 6- to 10-year pre-diagnosis window and 0.80 in the 1- to 5-year pre-diagnosis window. These results highlight the potential of speech-based technologies as non-invasive tools for early AD detection in real-world settings or for triage improvement in clinical trials.