ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

High resolution prosody modification for speech synthesis

Francisco M. Gimenez de los Galanes, David Talkin

In this paper we will introduce RTIPS, a system for arbitrary high-resolution modification of the prosodic variables of speech: fundamental frequency, rhythm (segmental duration) and intensity. It is based on the Resample and ovelap-add (R-OLA) algorithm for fundamental frequency and duration modification of speech. The algorithm works pitch-synchronously in order to accurately modify the pitch contour, and it uses estimates of the glottal closure instants (epochs) as the synchronism marks. This technique is very similar to other OLA-based methods for time or pitch modification, but because of the introduction of the resampling step, voice quality (especially for high-pitched voices) is much more natural after resynthesis, at any given output sampling frequency. The reliability of the R-OLA algorithm is highly depen- dent on the accuracy of the method used for epoch detection, so this preprocessing step has to be carefully designed.


doi: 10.21437/Eurospeech.1997-208

Cite as: Gimenez de los Galanes, F.M., Talkin, D. (1997) High resolution prosody modification for speech synthesis. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 557-560, doi: 10.21437/Eurospeech.1997-208

@inproceedings{gimenezdelosgalanes97_eurospeech,
  author={Francisco M. {Gimenez de los Galanes} and David Talkin},
  title={{High resolution prosody modification for speech synthesis}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={557--560},
  doi={10.21437/Eurospeech.1997-208},
  issn={1018-4074}
}