This paper describes conversational spontaneous speech synthesis based on hidden Markov model (HMM). To reduce the amount of data required for model training, we utilize average-voice-based speech synthesis framework, which has been shown to be effective for synthesizing speech with arbitrary speaker's voice using a small amount of training data. We examine several kinds of average voice model using reading-style speech and/or conversational speech. We also examine an appropriate utterance unit for conversational speech synthesis. Experimental results show that the proposed two-stage model adaptation method improves the quality of synthetic conversational speech.