ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Probabilistic linear discriminant analysis for robust speaker identification in co-channel speech

Navid Shokouhi, John H. L. Hansen

Co-channel speech refers to a monophonic audio recording in which at least two speakers are present. Meeting and telephone conversations recorded on a single channel are examples of co-channel speech. In this study, we address the problem of speaker identification (SID) for trials that contain co-channel speech in the train and/or test sessions. The assumption here is that there is access to i-vectors for all the recordings and we would like to compensate for interfering speech without requiring any changes or enhancements on the audio. This is an attractive approach, since state-of-the-art SID systems are developed on i-vectors and thereby solutions that do not require alterations in the i-vector extraction stage are more convenient. We propose modifications to the standard PLDA formulation that enables extracting more accurate estimates of the eigenvoice matrix in the presence of interfering speech and consequently more accurate statistics for speaker dependent latent variables. The proposed co-channel PLDA formulation results in 30% relative drop in equal error rate when compared to the standard PLDA system for co-channel sessions with signal-to-interference ratios as low as 0dB.