Text-to-Speech (TTS) systems have recently seen great progress in synthesizing high-quality speech. However, the prosody of generated utterances often is not as diverse as prosody of the natural speech. In the case of multi-speaker or voice cloning systems, this problem becomes even worse as information about prosody may be present in the input text and the speaker embedding. In this paper, we study the phenomenon of the presence of emotional information in speaker embeddings recently revealed for i-vectors and x-vectors. We show that the produced embeddings may include devoted components encoding prosodic information. We further propose a technique for finding such components and generating emotional speaker embeddings by manipulating them. We then demonstrate that the emotional TTS system based on the proposed method shows good performance and has a smaller number of trained parameters compared to solutions based on fine-tuning.