Recently some novel techniques have utilized sophisticated algorithms to correct phase or use phase information by processing real- and image-part respectively or simultaneously in the STFT domain. However, neural networks can not process a complex-valued feature, i.e., the STFT of a noisy speech. Therefore, these methods estimating the STFT of a clean signal can only obtain sub-optimal solutions. To avoid tackling complex-value operations, we formulate that only the real part of 2K-point STFT is utilized as the feature that holds all signal information. Therefore, speech enhancement in the STFT domain turns into a real-valued task. However, it is hard for the network to estimate the correct sign. Consequently, we develop an estimator to predict the real part sign and a decoder to estimate the targeted real part's mask. Then, we devise some experiments to evaluate our model over kinds of metrics. The results indicate that our model outperforms several state-of-the-art (SOTA) models.