ISCA Archive Eurospeech 1993
ISCA Archive Eurospeech 1993

Speaker verification over telephone channels based on concatenated phonemic hidden Markov models

Johan de Veth, Guido Gallopyn, Hervé Bourlard

In this paper, we describe a speaker verification system for telephone channels based on randomly prompted digit strings and using concatenated context-dependent phonemic hidden Markov models (HMMs). The main goal of this work was to achieve acceptable speaker verification performance while keeping the number of parameters (and, consequently, the amount of training material) as well as the CPU requirements relatively small. To optimize the performance of this system, several features (that had been separately suggested before) have been used, i.e.: (1) context-dependent phoneme models, (2) silence and garbage (click) models to remove extraneous parts out of the actual utterance, (3) better decision logic based on associated speakers (also referred to as "cohort" in [Rosenberg et al., 1992]), (4) better feature vectors using "rasta" processing as suggested in [Hermansky et al., 1991], (5) rejection of garbage utterances without significantly affecting the overall verification performance. In this paper we show how we further improved this system using increased trial length and automatic model adaptation. We show that these two techniques allow to achieve an average equal error rate (EER) on difficult (and realistic) tasks of 0.2%, which is 1.6 orders of magnitude smaller compared to the system we reported earlier [de Veth et al., 1993].