Spoof speech detection (SSD) is used to protect automatic speaker verification systems from malicious voice attacks. Existing SSDs commonly face a challenge: it is difficult to identify unknown spoof attacks that were not present during the training phase. This problem arises from the fact that deep learning models are prone to learning non-generalizable spurious features belonging to spoof speech in the training set when dealing with class imbalanced data, rather than the core features belonging to bonafide speech that can fundamentally distinguish between bonafide and spoof speech. To overcome this challenge, we propose a progressive training method based on improved virtual softmax and data balancing to assist SSD in learning core features representing bonafide speech, thereby learning the distribution of bonafide speech and delineating its boundaries to distinguish between spoof speech. Firstly, our innovative progressive training method starts training from a subset of class balanced training data, ensuring that the model can learn core features more accurately without being affected by a large number of spurious features. Then, the subset gradually expands to the full training dataset. In addition, the improved virtual softmax with a set of masks enables the added virtual features to focus on guiding the model to learn only the core features representing bonafide speech, while relaxing the clustering requirements for various spoof speech. The improved virtual softmax enables SSD to use core features to distinguish spoof speech, while also solving the overfitting problem caused by over clustering of various spoof samples that do not belong to the same category in the training data. We trained the model using the ASVSpoof2019LA and ASVSpoof5 training datasets. The evaluation results on multiple datasets including ASVSpoof2019LA-eval, ASVSpoof2021LA-eval, ASVSpoof2021DF-eval, ASVSpoof2015-eval, In The Wildeval, and ASVSpoof5 eval demonstrate the generalization capability of our method.