This paper describes a text-to-speech (TTS) system developed at the Nagoya Institute of Technology (NITech) for the Blizzard Challenge 2018. In the challenge, about seven hours of highly expressive speech data from English children’s audiobooks were provided as training data. For this challenge, we introduced deep neural network (DNN)-based pause insertion model and WaveNet-based neural vocoder. Large-scale subjective evaluation results show that the NITech TTS system achieved high score in various evaluation criteria.