ISCA Archive Interspeech 2023
ISCA Archive Interspeech 2023

A GAN Speech Inpainting Model for Audio Editing Software

Haixin Zhao

This paper proposes the generative adversarial networks (GAN) speech inpainting model consisting of the GAN magnitude inpainting network and the phase reconstruction algorithm. The GAN network with partial convolutions implements inpainting specific time-frequency (T-F) areas of spectrograms, and captures latent information of speech spectrograms and high-dimensional features using the proposed loss function, contributing to more real and speech-like results. The phase reconstruction algorithm adopts two strategies for different magnitudes, inpainting clear harmonics while reducing the buzzes in high frequency. The proposed model outperforms the conventional and the T-F mask-based deep inpainting baselines in inpainting performance of Short-Term Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ). Since it can inpaint specific T-F areas and improve the inpainting performance, the model implements the speech inpainting for audio editing software.