ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

GL-SSD: Global and Local Speech Style Disentanglement by vector quantization for robust sentence boundary detection in speech stream

Kuncai Zhang, Wei Zhou, Pengcheng Zhu, Haiqing Chen

Sentence boundary detection (SBD) in speech, aimed at segmenting the sentence units from the audio speech, plays a significant role in a broad range of tasks such as automatic speech recognition and speech translation. Previous studies have explored the solution based on basic acoustic features and high level semantic representation. Although widely studied, sentence boundary detection still remains a challenge when applied to different speech styles, including the global style and local style. To improve the robustness of SBD in the scene of different speech styles, we propose Global and Local Speech Style Disentanglement (GL-SSD) by vector quantization from the raw speech and incorporate the disentangled style representations into the semantic representation. Relevant experiments demonstrate the superior performance of the proposed method compared to other recent mainstream methods.