We investigate how the scale of Text-to-Speech (TTS) models' training data influences Automatic Speech Recognition (ASR) performance when real training data is replaced entirely by synthetic speech. We propose an extension to established data scaling laws that incorporates an additional term capturing the mismatch between real and synthetic distributions in low-data regimes. We compare Mean Squared Error (MSE) and Denoising Diffusion Probabilistic Models (DDPMs) for TTS: MSE-based speech, though oversmoothed, provides stronger ASR results with smaller TTS datasets, while DDPM-based speech surpasses MSE once trained on enough data to better approximate the real distribution. Our findings also show that synthetic speech can only approximate or match real data performance if the TTS model itself is trained on a sufficiently large corpus, emphasizing that distribution coverage is crucial for fully synthetic ASR training.