ISCA Archive Interspeech 2012
ISCA Archive Interspeech 2012

Overlapped speech detection in meeting using cross-channel spectral subtraction and spectrum similarity

Ryo Yokoyama, Yu Nasu, Koichi Shinoda, Koji Iwano

We propose an overlapped speech detection method for speech recognition and speaker diarization of meetings, where each speaker wears a lapel microphone. Two novel features are utilized as inputs for a GMM-based detector. One is speech power after cross-channel spectral subtraction which reduces the power from the other speakers. The other is an amplitude spectral cosine correlation coefficient which effectively extracts the correlation of spectral components in a rather quiet condition. We evaluated our method using a meeting speech corpus of four persons. The accuracy of our proposed method, 74.1%, was significantly better than that of the conventional method, 67.0%, which uses raw speech power and power spectral Pearson's correlation coefficient.

Index Terms: overlap speech detection, spectral subtraction, cosine distance