ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

Spoken-Term Discovery using Discrete Speech Units

Benjamin van Niekerk, Julian Zaïdi, Marc-André Carbonneau, Herman Kamper

Discovering a lexicon from unlabeled audio is a longstanding challenge for zero-resource speech processing. One approach is to search for frequently occurring patterns in speech. We revisit this idea by proposing DUSTED: Discrete Unit Spoken-TErm Discovery. Leveraging self-supervised models, we encode input audio into sequences of discrete units. Inspired by alignment algorithms from bioinformatics, we find repeated speech patterns by searching for similar sub-sequences of units. Since discretization discards speaker information, DUSTED finds better matches across speakers, improving the coverage and consistency of the discovered patterns. We demonstrate these improvements on the ZeroSpeech Challenge, achieving state-of-the-art results on the spoken-term discovery track. Finally, we analyze the duration distribution of the patterns, showing that our method finds longer word- or phrase-like terms.