Speaker recognition systems have been shown to work well when recordings are collected in conditions with relatively limited mismatch. Thus, a significant focus of the current research is techniques for robust system performance when greater variability is present. This study considers a diverse data set with recordings collected in multiple different rooms with different types of microphones. A technique recently introduced to the speaker recognition community, called partial least squares (PLS), is considered for decomposing the features and mitigating the degradation in performance due to room and/or microphone mismatch. Results of this study suggest that PLS decomposition can provide substantial improvements in performance in the presence of mismatched recording conditions. The outcomes of this study provide further validation for the partial least squares decomposition and encourage further consideration of PLS for reducing session and environment variability in speaker recognition tasks.
Index Terms: speaker recognition, partial least squares, subspace decomposition