Voice biometrics for user authentication is a task in which the goal is to perform convenient, robust and secure authentication of speakers. In this work we investigate the use of state-of-theart text-independent and text-dependent speaker verification technology for user authentication. We evaluate three different authentication conditions: global digit strings, speaker specific digit stings and prompted digit strings. Harnessing the characteristics of the different types of conditions can provide benefits such as authentication transparent to the user (convenience), spoofing robustness (security) and improved accuracy (reliability). The systems were evaluated on a corpus collected by Wells Fargo Bank which consists of 750 speakers. We show how to adapt techniques such as joint factor analysis (JFA), i-vectors, Gaussian mixture models with nuisance attribute projection (GMM-NAP) and hidden Markov models with NAP (HMM-NAP) to obtain improved results for new authentication scenarios and environments.
Overall, EERs significantly lower than 1% have been obtained for the matched channel condition, while the error almost triples for the mismatched channel condition.
In order to be able to use advanced techniques such as JFA and i-vectors in a realistic low-latency system we have developed the JFAlight method and the efficient i-vector extraction method for efficient approximated JFA and i-vector scoring. Using these algorithms we managed to speed up the JFA and i-vector methods to be comparable to the widely used NAP method.