ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

Robust Personal Voice Activity Detection for Mitigating Domain Mismatch and False Acceptance Scenarios

Yuke Lin, Jun Chen, Wenjie Li, Longshuai Xiao, Chao Weng

Personal Voice Activity Detection (pVAD), which leverages pre-enrolled speaker information to identify the presence of a specific speaker, has been widely adopted in mobile devices. However, domain mismatches between enrolled and test data are common in real-world scenarios, resulting in significant performance degradation. Additionally, existing pVAD models primarily optimize detection performance for a target speaker but often fail to address the challenge of false acceptance, especially when interfering speakers share similar voice characteristics. To address these limitations, we propose a novel backbone integrated with an auxiliary decoder and utilize an embedding-updating method during the inference phase to enhance performance under domain mismatch conditions. Furthermore, we introduce an on-the-fly hard-sample data simulation strategy, which has been shown to significantly reduce false acceptance rates, as demonstrated by our experimental results.