Speech synthesizers based on frequency domain analysis usually have problems when they don't use phase information. For instance, they generate monotonous and machine-like speech. It has been found that phase jitters (PJs) are important factors on naturalness of synthesized speech. We analyze the PJs of natural speech using pitch synchronous FFT and construct the PJ model from this analysis. We also demonstrated that the synthetic speech using power spectrum envelope(PSE) and the PJ components can be almost indistinguishable from the natural speech.
Keywords: phase jitters, power spectrum envelope, pitch synchronous FFT