ISCA Archive Odyssey 2018
ISCA Archive Odyssey 2018

Convolutional Neural Network Based Speaker De-Identification

Fahimeh Bahmaninezhad, Chunlei Zhang, John Hansen

Concealing speaker identity in speech signals refers to the task of speaker de-identification, which helps protect the privacy of a speaker. Although, both linguistic and paralinguistic features reveal personal information of a speaker and they both need to be addressed, in this study we only focus on speaker voice characteristics. In other words, our goal is to move away from the source speaker identity while preserving naturalness and quality. The proposed speaker de-identification system maps voice of a given speaker to an average (or gender-dependent average) voice; the mapping is modeled by a new convolutional neural network (CNN) encoder-decoder architecture. Here, the transformation of both spectral and excitation features are studied. The voice conversion challenge 2016 (VCC-2016) database is used to develop and examine performance of the proposed method. We use two different approaches for evaluations: (1) objective evaluation: equal error rates (EERs) calculated by a state-of-the-art i-vector/PLDA speaker recognition system range between 1.265 - 3.46 \% on average for all developed systems, and (2) subjective evaluation: achieved 2.8 naturalness mean opinion score (MOS). Both objective and subjective experiments confirm the effectiveness of our proposed de-identification method.

doi: 10.21437/Odyssey.2018-36

Cite as: Bahmaninezhad, F., Zhang, C., Hansen, J. (2018) Convolutional Neural Network Based Speaker De-Identification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2018), 255-260, doi: 10.21437/Odyssey.2018-36

  author={Fahimeh Bahmaninezhad and Chunlei Zhang and John Hansen},
  title={{Convolutional Neural Network Based Speaker De-Identification}},
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2018)},