There is urgent need for scalable, non-invasive and quantifiable biomarkers in neurodegenerative disorders. Speech is an attractive candidate with potential for remote and cheap assessments. Progress is limited by a lack of high quality clinically annotated speech data. We present a longitudinal speech corpus including speakers with dementia, motor neuron disease, Parkinson’s disease, progressive multiple sclerosis, and healthy individuals. Participants complete standardised recordings on an app co-produced with patients, aligned to contemporaneous phenotyping (clinical rating scales, cognitive tests and blood-based biomarkers). 780 participants have provided 5169 recordings in 1033 assessments. Benchmark classification and regression models show promising performance, and predictions on non-speech segments demonstrate limited bias from recording conditions. We continue to upscale data collection and analysis across larger diverse populations to accelerate clinical translation.