Although recent text-to-speech (TTS) models based on flow matching have achieved remarkable generation quality, their reliance on numerous sampling steps hinders practical deployment. In this work, we introduce an adversarial post-training strategy for flow matching TTS that significantly reduces the required sampling steps. Our approach treats a pre-trained flow matching model as a few-step generator, optimizing it with reconstruction and adversarial objectives. We integrate this technique into APTTS, our novel latent flow matching framework for zero-shot TTS, and demonstrate its superiority over state-of-the-art baselines with real-time applicability. Furthermore, we validate the scalability of our adversarial post-training approach by applying it to Matcha-TTS, a publicly available flow matching model. Evaluations on a multi-speaker dataset show that our method enhances audio quality while reducing inference time, underscoring its potential as a scalable solution for real-time TTS.