A Japanese text-to-speecli conversion software system has been developed. For that, two new techniques are proposed. One is a prosody generation model which utilizes the invariant prosodic structures against any speech tempo for generating natural rhythm and intonation. The other technique is the waveform concatenation method, using expanded CV-VC speech units. It runs on standard Personal Computers (PC) without any additional hardware, such as Digital Signal Processors (DSP), so that installation cost is very low. And application interfaces are provided, so that application programs can easily utilize the text-to-speech function.