Pitch-scaled analysis based residual reconstruction for speech analysis and synthesis

Zhengqi Wen, Hideki Kawahara, Jianhua Tao

The typical problem in LPC-like vocoder is buzzing sound which is mainly due to the simple pulse train or noise excitation model. One way to improve it is to reconstruct the residual obtained from inverse filtering. So a new parametric representation of speech based on pitch-scaled analysis is proposed in this paper. Pitch-scaled analysis is used to extract the periodic spectrum of residual with half pitch period length. Then these periodic spectrums are decorrelated by principal component analysis (PCA) to reduce their dimension. Aperiodic measure is defined as the harmonic-to-noise ratio in the frequency domain where voicing cut-off frequency (VCO) is used to control the smoothness of aperiodicity. Periodic spectrum and aperiodic measure together with F0 are indicated as excitation parameters in the proposed LPC vocoder. Experimental results show that this proposed vocoder can get a mean opinion score (MOS) of 4.1 for a female voice before dimensionality reduction and keep the high-quality property after parameter compression.

Index Terms: speech parametric representation, pitch-scaled analysis, voicing cut-off frequency, principal component analysis

doi: 10.21437/Interspeech.2012-136

Cite as: Wen, Z., Kawahara, H., Tao, J. (2012) Pitch-scaled analysis based residual reconstruction for speech analysis and synthesis. Proc. Interspeech 2012, 374-377, doi: 10.21437/Interspeech.2012-136

