A two-layer spiking neural network is used to segregate double vowels. The first layer is a partially connected spiking neurons of relaxation oscillatory type, while the second layer consists of fully connected relaxation oscillators. A twodimensional auditory image generated by the enhanced spectrum of cochlear filter bank envelopes is computed. The segregation is based on a channel selection strategy. At each instant of time each channel is assigned to one of the sources present in the auditory scene, i.e. speakers. No prior estimation of pitch for the underlying sources is necessary.