We present a novel approach for blind syllable segmentation that combines model-based feature selection with data-driven classification. In particular, we learn a function that maps short-term energy peaks of a speech utterance onto either the vowel or consonant class. The features used for classification capture spectral and energy signatures which are characteristic of the phonetic properties of the English language. The identified vowel peaks subsequently act as the nucleus of our syllable segments. We demonstrate the effectiveness of our proposed method using nested cross validation on 400 unique test utterances taken randomly from the TIMIT dataset containing over 5000 syllables in total. Our hybrid approach achieves lower insertion rate than the state-of-the-art segmentation methods and a lower deletion rate than all the baseline comparisons.