This paper presents the SRCB-SL text-to-speech system that participated in Blizzard Challenge 2021. This year’s Challenge was in European Spanish and had come with 5 hours of clean speech data from a female native speaker. It included two tasks: a hub task that asked the participant to build a voice from the provided data and synthesize all-Spanish speech, and a spoke task in which the target speech contained a few English words. Our system featured a text analysis - acoustic model - vocoder pipeline. The text analyzer combined several old and new function modules to convert input text to a sequence of Spanish phonemes with prosodic boundary (break) markers. English phonemes were mapped to their Spanish counterparts in spoke task. The acoustic model was built around FastSpeech, and converted the phoneme sequences from text analysis to mel-spectrograms. For vocoder we used HiFi-GAN, which we trained on Challenge data and fine-tuned using predicted mel-spectrogram as input. This same system was used for both tasks. Challenge results showed that our system (identified as K) worked well by most of the criteria, which validated the effectiveness of our method.