ISCA Archive Interspeech 2017
ISCA Archive Interspeech 2017

Detecting Overlapped Speech on Short Timeframes Using Deep Learning

Valentin Andrei, Horia Cucu, Corneliu Burileanu

The intent of this work is to demonstrate how deep learning techniques can be successfully used to detect overlapped speech on independent short timeframes. A secondary objective is to provide an understanding on how the duration of the signal frame influences the accuracy of the method. We trained a deep neural network with heterogeneous layers and obtained close to 80% inference accuracy on frames going as low as 25 milliseconds. The proposed system provides higher detection quality than existing work and can predict overlapped speech with up to 3 simultaneous speakers. The method exposes low response latency and does not require a high amount of computing power.