ISCA Archive Interspeech 2018
ISCA Archive Interspeech 2018

Improving Gender Identification in Movie Audio Using Cross-Domain Data

Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan

Gender identification from audio is an important task for quantitative gender analysis in multimedia and to improve tasks like speech recognition. Robust gender identification requires speech segmentation that relies on accurate voice activity detection (VAD). These tasks are challenging in movie audio due to diverse and often noisy acoustic conditions. In this work, we acquire VAD labels for movie audio by aligning it with subtitle text and train a recurrent neural network model for VAD. Subsequently, we apply transfer learning to predict gender using feature embeddings obtained from a model pre-trained for large-scale audio classification. In order to account for the diverse acoustic conditions in movie audio, we use audio clips from YouTube labeled for gender. We compare the performance of our proposed method with baseline experiments that were setup to assess the importance of feature embeddings and training data used for gender identification task. For systematic evaluation, we extend an existing benchmark dataset for movie VAD, to include precise gender labels. The VAD system shows comparable results to state-of-the-art in movie domain. The proposed gender identification system outperforms existing baselines, achieving an accuracy of 85% for movie audio. We have made the data and related code publicly available.

doi: 10.21437/Interspeech.2018-1462

Cite as: Hebbar, R., Somandepalli, K., Narayanan, S. (2018) Improving Gender Identification in Movie Audio Using Cross-Domain Data. Proc. Interspeech 2018, 282-286, doi: 10.21437/Interspeech.2018-1462

  author={Rajat Hebbar and Krishna Somandepalli and Shrikanth Narayanan},
  title={{Improving Gender Identification in Movie Audio Using Cross-Domain Data}},
  booktitle={Proc. Interspeech 2018},