In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis. General text-to-speech synthesis frameworks for reading-style speech use text-dependent information referred to as context. However, to achieve more human-like speech synthesis, we should take paralinguistic and nonlinguistic features into account. We focus on adding contextual features to the input features of DNN-based speech synthesis using spontaneous speech corpus with rich tags including paralinguistic and nonlinguistic features such as prosody, disfluency, and morphological features. Through experimental evaluations, we investigate the effectiveness of additional contextual factors and show which factors enhance the naturalness as spontaneous speech. This paper contributes as a guide to data collection for speech synthesis.