This paper introduces the TAL speech synthesis system for Blizzard Challenge 2021 which aims to synthesize voice as similar as the provided target speaker. We built a Spanish speech synthesis system based on the pre-trained BERT model, GST and HiFi-GAN for task 2021-SH1. First, we use a modified open source Spanish front-end to generate Spanish phoneme sequences from the input Spanish text. Then, we constructed a modified GST model which condition the encoder on linguistic features. The acoustic model is trained on two speakers, and then fine-tune on the target speaker from provided corpus. To speed up the synthesis process and maintain the speech quality, we use HiFi-GAN, an efficient and high fidelity GAN-based vocoder, to synthesize mel-spectrogram into speech waveform. The evaluation results shows that our system performs well especially in the word error rates evaluation.