ISCA Archive Blizzard 2012
ISCA Archive Blizzard 2012

The Lessac Technologies Time Domain Diphone Parametric Synthesis System for Microcontrollers for Blizzard Challenge 2012

Mike Baumgartner, Reiner Wilhelms-Tricarico, John Reichenbach

Advances in the capabilities of microcomputer systems have opened the door to new approaches to real time speech synthesis. In the past, diphone synthesis was a popular synthesis method. More recently, unit selection speech synthesis has afforded higher quality synthesis, mainly by eliminating the need for significant signal processing, and thus preventing the signal processing artifacts that are the consequences of speech segment modifications. Instead, unit selection synthesis consists substantially of real segments of unaltered speech. It was hoped that with enough voice databases, that could provide enough recorded sections of speech, there would be sufficient coverage for any utterance required for speech synthesis. Even as unit selection speech synthesis system databases have become considerably larger, the realization of constructing natural speech entirely from segments of unaltered speech units has still fallen short of expectations. The Blizzard Challenge has provided a measure to quantify how much of a difference in quality has transpired in the new unit selection approaches compared to the old diphone synthesis methods. This diphone synthesis system also is an example of working towards a goal of high quality synthesis that still works on very limited hardware resources.