ISCA Archive Interspeech 2019
ISCA Archive Interspeech 2019

Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction

Ju Lin, Sufeng Niu, Zice Wei, Xiang Lan, Adriaan J. van Wijngaarden, Melissa C. Smith, Kuang-Ching Wang

Speech enhancement techniques that use a generative adversarial network (GAN) can effectively suppress noise while allowing models to be trained end-to-end. However, such techniques directly operate on time-domain waveforms, which are often highly-dimensional and require extensive computation. This paper proposes a novel GAN-based speech enhancement method, referred to as S-ForkGAN, that operates on log-power spectra rather than on time-domain speech waveforms, and uses a forked GAN structure to extract both speech and noise information. By operating on log-power spectra, one can seamlessly include conventional spectral subtraction techniques, and the parameter space typically has a lower dimension. The performance of S-ForkGAN is assessed for automatic speech recognition (ASR) using the TIMIT data set and a wide range of noise conditions. It is shown that S-ForkGAN outperforms existing GAN-based techniques and that it has a lower complexity.