The main objective of this work is to study the expres- sivity transfer in a speaker's voice for which no expressive speech data is available in non-autoregressive end-to-end TTS systems. We investigated the expressivity transfer capability of probability density estimation based on deep generative mod- els, namely Generative Flow (Glow) and diffusion probabilis- tic models (DPM). The usage of deep generative models pro- vides better log likelihood estimates and tractability of the sys- tem, subsequently providing high-quality speech synthesis with faster inference speed. Furthermore, we propose the usage of various expressivity encoders, which assist in expressivity transfer in the text-to-speech (TTS) system. More precisely, we used self-attention statistical pooling and multi-scale ex- pressivity encoder architectures for creating a meaningful rep- resentation of expressivity. In addition to traditional subjective metrics used for speech synthesis evaluation, we incorporated cosine-similarity to measure the strength of attributes associ- ated with speaker and expressivity. The performance of a non- autoregressive TTS system with a multi-scale expressivity en- coder showed better expressivity transfer on Glow and DPM- based decoders. Thus, illustrating the ability of multi-scale ar- chitecture to apprehend the underlying attributes of expressivity from multiple acoustic features.