Observation of speech spectrum leads to the fact that speech has a specific spectral fluctuation pattern both along time and frequency. In this paper, we integrate the usage of this nature in a multi-feature approach for voice activity detection. The effect of separating such specific spectral fluctuation using multi-stage HPSS (Harmonic-Percussive Sound Separation) has been analyzed over conventional features in voice activity detection, reducing frame-wise detection error by up to 78%, depending on the SNR conditions and noise type. The multi-feature approach has been tested using Hidden Markov Models to model the features stream as a sequence, which has out-performed standard and similar VAD proposals in utterance-based tests intended for automatic speech recognition.