Kernel k-means and spectral clustering have been used to separate input space clusters by means of non-linear mappings. In this paper we adapt and extend these methods to identify constitutive units of speech: consonants, vowels and silences. The discover of this structure is very useful for prosody-based systems of automatic language identification or language disorders detection. In order to find stable speech segments, infra-phonetic segmentation is performed using the divergence forward-backward algorithm. Our test corpus is a six-languages subset of OGI_MLTS corpus. We present better classification results than traditional approaches as well as faster processing times.