ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe

Hervé Bredin

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Version 2.1 introduces a major overhaul of pyannote.audio default speaker diarization pipeline, made of three main stages: speaker segmentation applied to a short sliding window, neural speaker embedding of each (local) speakers, and (global) agglomerative clustering. One of the main objectives of the toolkit is to democratize speaker diarization. Therefore, on top of a pretrained speaker diarization pipeline that gives good results out of the box, we also provide a recipe that practitioners can follow to improve its performance on their own (manually annotated) dataset. It has been used for various challenges and reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022.


doi: 10.21437/Interspeech.2023-105

Cite as: Bredin, H. (2023) pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe. Proc. INTERSPEECH 2023, 1983-1987, doi: 10.21437/Interspeech.2023-105

@inproceedings{bredin23_interspeech,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1983--1987},
  doi={10.21437/Interspeech.2023-105},
  issn={2958-1796}
}