ISCA Archive SpeechProsody 2008
ISCA Archive SpeechProsody 2008

Joint prosodic and spectral modeling for robust speaker verification

Yuan-Fu Liao, Wen-Chieh Chang, Zong-You Xie, Ding-Yun Zeng, Yau-Tarng Juang

In this paper, a joint prosodic and spectral modeling framework is proposed instead of traditional score-domain fusion approaches to alleviate the problem of mismatch channel/handset/ambient noise. The basic idea is to embed the concept of hierarchical structure of speech prosody into an ergodic HMM (EHMM), and model the prosodic status transitions and prosodic/spectral features by EHMM’s states, state transition probabilities and state-dependent observation distributions, respectively. Experimental results evaluated on the standard single-speaker detection task of NIST 2001 speaker recognition evaluation (NIST-SRE 2001) showed that the proposed approach not only outperformed the spectral feature-based baseline (8.04% vs. 8.64% in equal error rate, EER) but also worked a little bit better than score-domain fusion ( 8.44%) approach.

doi: 10.21437/SpeechProsody.2008-34

Cite as: Liao, Y.-F., Chang, W.-C., Xie, Z.-Y., Zeng, D.-Y., Juang, Y.-T. (2008) Joint prosodic and spectral modeling for robust speaker verification. Proc. Speech Prosody 2008, 143-146, doi: 10.21437/SpeechProsody.2008-34

  author={Yuan-Fu Liao and Wen-Chieh Chang and Zong-You Xie and Ding-Yun Zeng and Yau-Tarng Juang},
  title={{Joint prosodic and spectral modeling for robust speaker verification}},
  booktitle={Proc. Speech Prosody 2008},