ISCA Archive Interspeech 2014
ISCA Archive Interspeech 2014

A flexible front-end for HTS

Matthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, Thomas Merritt

Parametric speech synthesis techniques depend on full context acoustic models generated by language front-ends, which analyse linguistic and phonetic structure. HTS, the leading parametric synthesis system, can use a number of different front-ends to generate full context models for synthesis and training. In this paper we explore the use of a new text processing front-end that has been added to the speech recognition toolkit Kaldi as part of an ongoing project to produce a new parametric speech synthesis system, Idlak. The use of XML specification files, a modular design, and modern coding and testing approaches, make the Idlak front-end ideal for adding, altering and experimenting with the contexts used in full context acoustic models. The Idlak front-end was evaluated against the standard Festival front-end in the HTS system. Results from the Idlak front-end compare well with the more mature Festival front-end (Idlak - 2.83 MOS vs Festival - 2.85 MOS), although a slight reduction in naturalness perceived by non-native English speakers can be attributed to Festival's insertion of non-punctuated pauses.


doi: 10.21437/Interspeech.2014-320

Cite as: Aylett, M.P., Dall, R., Ghoshal, A., Henter, G.E., Merritt, T. (2014) A flexible front-end for HTS. Proc. Interspeech 2014, 1283-1287, doi: 10.21437/Interspeech.2014-320

@inproceedings{aylett14_interspeech,
  author={Matthew P. Aylett and Rasmus Dall and Arnab Ghoshal and Gustav Eje Henter and Thomas Merritt},
  title={{A flexible front-end for HTS}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={1283--1287},
  doi={10.21437/Interspeech.2014-320},
  issn={2308-457X}
}