Reconstruction of speech envelope from neural signal is a general way to study neural entrainment, which helps to understand the neural mechanism underlying speech processing. Previous neural entrainment studies were mainly based on single-trial neural activities, and the reconstruction accuracy of speech envelope is not high enough, probably due to the interferences from diverse noises such as breath and heartbeat. Considering that such noises independently emerge in the consistent neural processing of the subjects responding to the same speech stimulus, we proposed a method to align and average electroencephalograph (EEG) signals of the subjects for the same stimuli to reduce the noises of neural signals. Pearson correlation of constructed speech envelops with the original ones showed a great improvement comparing to the single-trial based method. Our study improved the correlation coefficient in delta band from around 0.25 to 0.5, where 0.25 was obtained in previous leading studies based on single-trial. The speech tracking phenomenon not only occurred in the commonly reported delta and theta band, but also occurred in the gamma band of EEG. Moreover, the reconstruction accuracy for regular speech was higher than that for the time-reversed speech, suggesting that neural entrainment to natural speech envelope reflects speech semantics.