The first question by any customer who wants to buy or rent any service/product is always, ”How fast is it?”. So how fast is our Text-To-Speech(TTS) we get asked, how fast can it turn my text into speech, what is the Customer Perceived Latency (CPL) i.e. total time taken from the time a customer submits text to the time they get the audio file for that text? Like any other service, TTS also comprises of a number of moving parts contributing to CPL, for example, SSML parser, Text Normaliser, G2P, Models (Acoustic Model and Vocoder) and post processing. Each layer contributes in some way to the overall processing time. The goal is to reduce this processing time without any deterioration in the audio quality.