In this paper, we propose a Mandarin-English bilingual and code-switching text-to-speech (TTS) system featuring a diffusion model and generative adversarial network (GAN) to improve the output speech. To address speaker consistency, we employ a feature separation architecture that converts language and speaker IDs into embeddings as input to the encoder. Subsequently, we employ two adversarial classifiers and two classifiers to separate language and speaker features. We integrate a modified diffusion model and discriminators to push for better speech quality and speaker consistency, especially for code-swtiching scenarios. On the MOS measure, the performance of the proposed TTS system differs only slightly from the ground truth data in monolingual speech and achieves MOS of 3.83 in the synthesis of code-switching speech.