Segmentation of speech into phonemes is beneficial for many spoken language processing applications. Here, a novel method which uses auditory attention features for detecting phoneme boundaries from acoustic signal is proposed. The proposed phoneme segmentation method does not require transcription or acoustic models of phonemes. The auditory attention cues are biologically inspired and capture changes in sound characteristics by using 2D spectro-temporal receptive filters. When tested on TIMIT, it is shown that the proposed method successfully predicts phoneme boundaries and performs better than the state-of-the art phoneme segmentation methods.
Index Terms: speech segmentation, phoneme boundary detection, auditory attention model.