ISCA Archive Interspeech 2024
ISCA Archive Interspeech 2024

HarmoNet: Partial DeepFake Detection Network based on Multi-scale HarmoF0 Feature Fusion

Liwei Liu, Huihui Wei, Dongya Liu, Zhonghua Fu

Audio DeepFake detection (ADD) has become an increasingly challenging task recently, with the rise of various spoofing attacks utilizing artificially generated audio. The track 2 of ADD 2023 requires not only detecting DeepFake audio but also locating the manipulated regions. To tackle this unique challenge, we have proposed an innovative framework HarmoNet that leverages the Multi-scale harmonic F0 and Wav2Vec features with attention mechanism. This allows the model to effectively capture changes in each region of the utterance. Furthermore, we have introduced a new loss function named Partial Loss, which focuses more on the boundary between real and fake region. Additionally, we have designed a post-processor to refine the output of the model. Our framework achieved 70.61% in track 2 of ADD 2023, an improvement of 67.12% over baseline, and achieved the best performance. Moreover, HarmoNet also shows competitive performance on other DeepFake datasets.