Speaker segmentation is the task of finding speaker turns in an audio stream. We propose a metric-based algorithm based on Discrete Wavelet Transform (DWT) features. Principal component analysis (PCA) or linear discriminant analysis (LDA) [1] are further used to reduce the dimensionality of the feature space and remove redundant information. In the experiments our methods referred to as DWT-PCA and DWT-LDA are compared to the DISTBIC algorithm [2] using clean and noisy data of the TIMIT database. Especially, under conditions with strong noise, i.e. -10dB SNR, our DWT-PCA approach is very robust, the false alarm rate (FAR) increases by ¡«2% and the missed detection rate (MDR) stays about the same compared to clean speech, whereas the DISTBIC method fails ¡ª the FAR and MDR is almost ¡«0% and ¡«100%, respectively. For clean speech DWT-PCA shows an improvement of ¡«30% (relative) for both the FAR and MDR in comparison to the DISTBIC algorithm. DWT-LDA is performing slightly worse than DWT-PCA.
C. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.