ISCA Archive Odyssey 2001
ISCA Archive Odyssey 2001

Blind segmentation of a multi-speaker conversation using two different sets of features

Yaakov Metzger

An algorithm for labeling a two-speaker phone call according to the active speaker at each time frame is presented. The algorithm is based on clustering audio frames according to one features set, and then modeling speakers for each cluster and resegmenting iteratively, over a different features set. The first clustering stage is expected to yield clusters that contain audio of both speakers grouped according to the phonetic parts of speech. The second stage is expected to separate each of those clusters according to speakers, when the textual content of each cluster is more uniform. The methods to measure algorithm performance for blind segmentation task are discussed. The algorithm performance is tested and measured over conversations from the SPIDRE database.