Recent studies in deep learning based acoustic echo cancellation proves the benefits of introducing a linear echo cancellation module. However, the convergence problem and potential target speech distortion impose an additional learning burden for the neural network. In this paper, we propose a two-stage progressive neural network consisting of a coarse-stage and a fine-stage module. For the coarse-stage, a light-weighted network module is designed to suppress partial echo and potential noise, where a voice activity detection path is used to enhance the learned features. For the fine-stage, a larger network is employed to deal with the more complex echo path and restore the near-end speech. We have conducted extensive experiments to verify the proposed method, and the results show that the proposed two-stage method provides a superior performance to other state-of-the-art methods.