ISCA Archive ICSLP 2002
ISCA Archive ICSLP 2002

Unsupervised speaker segmentation of telephone conversations

Aaron E. Rosenberg, Allen Gorin, Zhu Liu, S. Parthasarathy

A process for segmenting 2-speaker telephone conversations by speaker with no prior speaker models is described and evaluated. The process consists of an initial segmentation using acoustic change and pause detection, segment clustering, and iterative modeling of segment clusters and resegmentation. The technique has been evaluated on (6), approximately 3 min long, customer care conversations. The technique does not resolve short (< 2 secs) or overlapping segments very well, but is capable of detecting longer segments (> 4 secs) with miss rates of the order of 10% and confusion rates 2% or less.