ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Combining amplitude and phase-based features for speaker verification with short duration utterances

Md. Jahangir Alam, Patrick Kenny, Themos Stafylakis

Due to the increasing use of fusion in speaker recognition systems, one trend of current research activity focuses on new features that capture complementary information to the MFCC (Mel-frequency cepstral coefficients) for improving speaker recognition performance. The goal of this work is to combine (or fuse) amplitude and phase-based features to improve speaker verification performance. Based on the amplitude and phase spectra we investigate some possible variations to the extraction of cepstral coefficients that produce diversity with respect to fused subsystems. Among the amplitude-based features we consider widely used MFCC, Linear frequency cepstral coefficients, and multitaper spectrum estimation-based MFCC (denoted here as MMFCC). To compute phase-based features we choose modified group delay- and all-pole group delay-, linear prediction residual phase-based features. We also consider product spectrum-based cepstral coefficients features that are influenced by both the amplitude and phase spectra. For performance evaluation, text-dependent speaker verification experiments are conducted on the a proprietary dataset known as Voice Trust-Pakistan (VT-Pakistan) corpus. Experimental results show that the fused system provide reduced error rate compared to both the amplitude and phase-based features. On the average fused system provided a relative improvement of 37% over the baseline MFCC systems in terms of EER, DCF (detection cost function) of SRE 2008 and DCF of SRE 2010.