ISCA Archive Interspeech 2025
ISCA Archive Interspeech 2025

How to Recover Long Audio Sequences Through Gradient Inversion Attack With Dynamic Segment-based Reconstruction

Xijie Zeng, Frank Rudzicz

Recent advancements in gradient inversion attacks have demonstrated the vulnerability of shared gradients in distributed learning systems, particularly in image and text domains. However, applying these techniques to audio data, especially longer speech sequences, remains largely unexplored. Our study introduces a novel approach that builds upon the principles of gradient inversion attacks to retrieve high-quality audio recordings from shared gradients. We propose an optimized spectrogram segmentation technique that enables extracting longer audio sequences with diverse acoustic features, without requiring complex post-processing techniques. Through this study, we overcome the limitations of previous methods that were restricted to short audio clips with simple acoustic features and limited semantic information.