Using multiple microphones for speech enhancement allows for exploiting
spatial information for improved performance. In most cases, the spatial
filter is selected to be a linear function of the input as, for example,
the minimum variance distortionless response (MVDR) beamformer. For
non-Gaussian distributed noise, however, the minimum mean square error
(MMSE) optimal spatial filter may be nonlinear.
Potentially, such
nonlinear functional relationships could be learned by deep neural
networks. However, the performance would depend on many parameters
and the architecture of the neural network. Therefore, in this paper,
we more generally analyze the potential benefit of nonlinear spatial
filters as a function of the multivariate kurtosis of the noise distribution.
The results imply that using a nonlinear spatial filter is only
worth the effort if the noise data follows a distribution with a multivariate
kurtosis that is considerably higher than for a Gaussian. In this case,
we report a performance difference of up to 2.6 dB segmental signal-to-noise
ratio (SNR) improvement for artificial stationary noise. We observe
an advantage of 1.2dB for the nonlinear spatial filter over the linear
one even for real-world noise data from the CHiME-3 dataset given oracle
data for parameter estimation.