ISCA Archive Clarity 2025
ISCA Archive Clarity 2025

Non-Intrusive Multi-Branch Speech Intelligibility Prediction using Multi-Stage Training

Ryandhimas E. Zezario, Szu-Wei Fu, Dyah A.M.G. Wisnu, Hsin-Min Wang, Yu Tsao
We propose an improved multi-branch speech intelligibility prediction model (iMBI-Net) for the third edition of the Clarity Prediction Challenge (CPC III). We develop three systems: iMBI-Net, which integrates spectral, waveform, and Whisper-based features with severity-level audiogram information and processes them through a multi-branch convolutional bidirectional long short-term memory (BLSTM) module with attention mechanisms, and also introduces multi-stage training; iMBI-Net-R, which adds a single refinement module to the iMBI-Net model; and iMBI-Net-R2, which incorporates three refinement modules using different acoustic inputs, with scores combined via ensembling. Experimental results demonstrate that all variants achieve notable performance, with iMBI-Net achieving third place at CPC III, highlighting the effectiveness of our approach.