ISCA Archive SLAM 2013
ISCA Archive SLAM 2013

Speaker attribution of australian broadcast news data

Houman Ghaemmaghami, David Dean, Sridha Sridharan

Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multispeaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.

Index Terms: speaker attribution, diarization, linking, complete linkage, broadcast news.

Cite as: Ghaemmaghami, H., Dean, D., Sridharan, S. (2013) Speaker attribution of australian broadcast news data. Proc. First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013), 72-77

  author={Houman Ghaemmaghami and David Dean and Sridha Sridharan},
  title={{Speaker attribution of australian broadcast news data}},
  booktitle={Proc. First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)},