Foundational models have advanced speech technology while introducing privacy concerns due to the sources and volume of pre-training data required. Synthetic speech could be an alternative as short utterances are indistinguishable from natural speech but limitations in prosody and tonal variation impact longer durations. We investigate if synthetic text-to-speech (TTS) systems have reached a point where it can substitute for natural speech in pre-training models for speech-based downstream tasks, e.g. phoneme recognition (PR). We also explore the degree to which these synthetic samples can be used when data augmentation is required. We pre-train three models using (i) natural speech; (ii) synthetic TTS cloned speech matched to the natural speakers; (iii) unmatched speech using standard voices provided by the state of the art VITS TTS system. They were fine-tuned for a PR task and results show TTS data does not currently contain the long term speech characteristics to replace natural speech in pre-training but has potential for low resource data augmentation.