ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

FFD: Fine-Finger Diffusion Model for Music to Fine-grained Finger Dance Generation

Boya Dong, Wentao Lei, Li Liu

Finger dance is an emerging social media trend using finger gesture motions for expression. Music to finger dance generation is challenging due to its fine-grained movements. Existing music-driven methods often fail to model subtle finger motions, yielding poor performances. We propose Fine-Finger Diffusion (FFD), the first end-to-end framework for music to finger dance generation. Our method employs a diffusion model to create rhythmically aligned finger movements while ensuring motion stability. A novel detail-aware loss (DAL) enhances temporal coherence by constraining inter-frame motion fluctuations. We introduce DanceFingers-4K, the first large-scale finger dance dataset containing 4007 video clips with music-motion pairs. Comprehensive evaluations demonstrate FFD's superiority over existing approaches across objective metrics and user study.