ISCA Archive Odyssey 2014
ISCA Archive Odyssey 2014

Comparison of human listeners and speaker verification systems using voice mimicry data

Ville Hautamäki, Rosa Gonzalez Hautamäki, Tomi Kinnunen, Anne-Maria Laukkanen

Voice mimicry of another speaker’s voice and speech characteristics is considered. In this work, we analyze the performance of two well known speaker verification systems against voice mimicry and compare it with a perceptual test with the same data. Our focus is to gain insights on how well listeners recognize speakers based on their voice samples when mimicry data is included and compare it to the overall performance of state-of-the-art speaker verification systems, a traditional Gaussian mixture model-universal background model (GMM-UBM) and an i-vector based classifier with cosine scoring. For the studied material in Finnish language, the mimicry attack was able to slightly increase the error rate in a range acceptable for the general performance of the system (EER from 9 to 11%). Our data reveals that enhancing the audio material to minimize the differences of data collected in different environments improves the accuracy of the system even in the presence of imitated speech. The performance of the human listening panel shows that successfully imitated speech is difficult to recognize, even more difficult to recognize a person who is intentionally trying to modify his or her own voice. Average listener made 8 errors from 34 selected trials.