Most of the existing deep neural network based speech enhancement methods usually operate on short-time Fourier transform domain or alternatively learned features without employing the speech production model. In this work, we present an efficient speech enhancement algorithm using the speech source-filter model. Concretely, we separate the framed speech into excitation and vocal tract components by homomorphic filtering, adopt two convolutional recurrent networks for estimating the reference magnitude of the separated components, and synthesize the minimum phase signal with the estimated components. Lastly, the enhanced speech is obtained by a post-processing procedure, including using the noisy phase and overlap-addition. Experimental results demonstrated that the proposed method yields a comparable performance with the state-of-the-art complex-valued neural network based method. In addition, we conducted extensive experiments and found that the proposed method is more efficient with a compact model.