Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features

Kelly, Finnian; Alexander, Anil; Forth, Oscar; Kent, Samuel; Lindh, Jonas; Åkesson, Joel

Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features

Finnian Kelly, Anil Alexander, Oscar Forth, Samuel Kent, Jonas Lindh, Joel Åkesson

Assessing the perceptual similarity of voices is necessary for the creation of voice parades, along with media applications such as voice casting. These applications are normally prohibitively expensive to administer, requiring significant amounts of ‘expert listening’. The ability to automatically assess voice similarity could benefit these applications by increasing efficiency and reducing subjectivity, while enabling the use of a much larger search space of candidate voices. In this paper, the use of automatically extracted phonetic features within an i-vector speaker recognition system is proposed as a means of identifying cohorts of perceptually similar voices. Features considered include formants (F1-F4), fundamental frequency (F0), semitones of F0, and their derivatives. To demonstrate the viability of this approach, a subset of the Interspeech 2016 special session ‘Speakers In The Wild’ (SITW) dataset is used in a pilot study comparing subjective listener ratings of similarity with the output of the automatic system. It is observed that the automatic system can locate cohorts of male voices with good perceptual similarity. In addition to these experiments, this proposal will be demonstrated with an application allowing a user to retrieve voices perceptually similar to their own from a large dataset.

Cite as: Kelly, F., Alexander, A., Forth, O., Kent, S., Lindh, J., Åkesson, J. (2016) Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features. Proc. Interspeech 2016, 1567-1568

@inproceedings{kelly16_interspeech,
  author={Finnian Kelly and Anil Alexander and Oscar Forth and Samuel Kent and Jonas Lindh and Joel Åkesson},
  title={{Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={1567--1568}
}