Recent advancements in gradient inversion attacks have demonstrated the vulnerability of shared gradients in distributed learning systems, particularly in image and text domains. However, applying these techniques to audio data, especially longer speech sequences, remains largely unexplored. Our study introduces a novel approach that builds upon the principles of gradient inversion attacks to retrieve high-quality audio recordings from shared gradients. We propose an optimized spectrogram segmentation technique that enables extracting longer audio sequences with diverse acoustic features, without requiring complex post-processing techniques. Through this study, we overcome the limitations of previous methods that were restricted to short audio clips with simple acoustic features and limited semantic information.