We examined the accuracy of the reconstructed speech spectrograms from neural responses recorded intracranially in human auditory cortex. Electrodes were implanted over the cortex of epilepsy patients for the localization of seizures, and neural responses were recorded as the subjects passively listened to continuous speech. We compared the reconstructed spectrograms estimated with two different models: a linear regression model and a deep neural network. Compared with linear regression model, the reconstructed spectrograms from the deep neural network achieved a higher average correlation with the original spectrograms. In addition, the reconstructed spectrograms from the neural network better preserved the average acoustic features of phones. We further investigated how changing the number of hidden layers in the network affects the reconstruction accuracy and found a better performance with deeper networks, particularly in the reconstruction of spectrotemporal modulation content of speech. These findings reveal the efficacy of deep neural network models in decoding speech signals from neural responses and provide a method for improving the performance of brain computer interfaces with prosthetic applications.