Recently, kernel additive modeling with generalized spatial Wiener filtering (GW) was presented for music/voice separation. In this paper, an adaptive auditory filtering, called generalized weighted β-order MMSE estimation (WbE), is applied to the basic iterative kernel back-fitting algorithm for improving the separation performance of monaural music signal into music/voice components. In the proposed method, the perceptually weighting factor α and the singular value decomposition (SVD)-based factorized spectral amplitude exponent β for each kernel component are adaptively calculated for effective WbE-based auditory filtering performance. Experimental results show that the proposed method achieves better separation performance than GW and the existing Bayesian estimators.