ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Overlapped Speech Detection in Broadcast Streams Using X-vectors

Lukas Mateju, Frantisek Kynych, Petr Cerva, Jiri Malek, Jindrich Zdansky

A new approach to overlapped speech detection (OSD) is introduced in this work. It is designed for real-time processing of streamed data and utilizes x-vectors as its input features. It thus allows us to reduce computational demands within the entire streaming data processing chain, where the same x-vectors can also be used for the related task of speaker diarization. Within our method, the x-vectors are extracted using a feed-forward sequential memory network (FSMN) and then fed into a simple neural classifier (speech or cross-talk), whose output is smoothed by a decoder based on weighted finite-state transducers (WFSTs). The evaluation is done on a Czech/Slovak broadcast dataset (we make this data public) and on the AMI meeting corpus. Our online method yields a solid performance while operating with a 2-second latency.