ISCA Archive Clarity 2025
ISCA Archive Clarity 2025

Speech intelligibility prediction based on syllable tokenizer

Szymon Drgas
In this report, an intrusive system for speech intelligibility prediction is described. It is based on a pre-trained SD-HuBERT, a neural network that transforms a speech signal to a sequence of embeddings that correspond to syllable-like segments. I propose a neural network that compares such sequences of embeddings using a bilinear neural network architecture. The experimental results show that the proposed system outperforms the baseline HASPI for the CPC3 data set. Furthermore, after adding internal HASPI features to the proposed system, further improvement is achieved.