In this work, we propose a fast and high quality neural vocoder for CPU implementation. The main approaches to realize fast inference via an autoregressive model are 1) a subband-based vocoder and 2) multiple samples prediction. Our previous work demonstrated that the combination worked well up to two samples simultaneous generation without quality degradation. To further increase the number of simultaneous samples while maintaining quality, we focus on the existence of an association between subband signals and multiple samples. Our proposed vocoder jointly models these associations with a multivariate Gaussian. Experimentals show that our proposed four-sample vocoder is 1.47 times faster than the conventional two-sample equivalent. For both the acoustic features extracted from natural speech and those predicted by TTS, the proposed method realizes generation with up to four samples without any significant degradation in naturalness. This vocoder also matched the naturalness comparable of the two-sample conventional method.