ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech

Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

In statistical speech synthesis, the quality of the synthesized speech depends on the quality of training data. As the sampling rate of speech is one of the effective factors, speech data has been recently recorded at a high sampling rate. However, the sampling rates of speech data recorded in the past or collected from the Internet were often low. Therefore, to use these speech data effectively for model training, we propose a mel-cepstral analysis technique that restores missing high frequency components from low-sampling-rate speech with a statistical approach. In this technique, high-sampling-rate speech waveforms are modeled directly by integrating feature extraction and modeling processes. This framework makes it possible to optimize whole processes on the basis of an integrated objective function. Then, mel-cepstral coefficients are estimated from the low-sampling-rate speech by using the model as a prior distribution. Experimental results show that the proposed method improved the quality of synthesized speech.


doi: 10.21437/Interspeech.2014-535

Cite as: Nakamura, K., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K. (2014) A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech. Proc. Interspeech 2014, 2494-2498, doi: 10.21437/Interspeech.2014-535

@inproceedings{nakamura14_interspeech,
  author={Kazuhiro Nakamura and Kei Hashimoto and Keiichiro Oura and Yoshihiko Nankaku and Keiichi Tokuda},
  title={{A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2494--2498},
  doi={10.21437/Interspeech.2014-535},
  issn={2308-457X}
}