To model the speech utterance at a finer granularity, this paper presents a novel state-alignment based supervector modeling method for text-independent speaker verification, which takes advantage of state-alignment method used in hidden Markov model (HMM) based acoustic modeling in speech recognition. By this way, the proposed modeling method can convert a text-independent speaker verification problem to a state-dependent one. Firstly, phoneme HMMs are trained. Then the clustered state Gaussian Mixture Models (GMM) is data-driven trained by the states of all phoneme HMMs. Next, the given speech utterance is modeled to sub-GMM supervectors in state level and be further aligned to be a final supervector. Besides, considering the duration differences between states, a weighting method is also proposed for kernel based support vector machine (SVM) classification. Experimental results in SRE 2008 core-core dataset show that the proposed methods outperform the traditional GMM supervector modeling followed by SVM (GSV-SVM), yielding relative 8.4% and 5.9% improvements of EER and minDCF, respectively.