ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Improved location features for meeting speaker diarization

Scott Otterson

This paper proposes several improvements to the correlation-based location features recently used in meeting speaker diarization. A speech-specific alternative to the generalized cross correlation phase transform (GCC-PHAT) algorithm is tested and shown to provide equal or better results without noise reduction or continuity-enforcing smoothing. The limitations of a single correlation reference waveform are discussed, and it is shown how a multi-band energy ratio feature can help overcome them, yielding significantly improved performance. An all-pairs correlation is also proposed, and when combined with energy ratios, it also improves upon the baseline system. However, the best combination is the baseline correlation features with energy ratios.