This paper describes our effort to build the SUTD-NUS system for Blizzard Challenge 2021. The challenge has two tasks: 1) Hub task 2021-SH1: to build a Spanish text-to-speech (TTS) system using about 5 hours data from a European Spanish female speaker, and 2) Spoke task 2021-SS1: to build a TTS system that is able to synthesize the Spanish text containing a small amount of English words, using the same training data as Hub task 2021-SH1. Our submitted system is an end-to-end TTS structure that can generate acoustic features from text input. MelGAN neural vocoder are utilized to generate speech waveforms from acoustic features for both SH1 and SS1 tasks. Evaluation results provided by the challenge organizers demonstrate the effectiveness of our submitted TTS system.