The multiband excitation (MBE) vocoder represents speech signal with a pitch, band magnitudes, and a voice / unvoice (V/UV) decision for each spectral band. In the conventional MBE model, model parameters are sequentially estimated in two steps. The pitch and band magnitudes are firstly estimated on the assumption of voiced speech model by the analysis-by-synthesis (AbS) in frequency domain, and then the V/UVs are decided. However, the synthetic spectrum by the above assumption may have large spectral distortion if the speech frame is strongly unvoiced such as transient region. In this paper, we propose joint estimation method which estimates and decides all the model parameters in AbS loop. For this, voiced or unvoiced speech models for each band are used during the analysis procedure. After estimating the parameters with the two speech models, a model for each band is selected so as to produce smaller spectral estimation error. By analyzing the short time spectrum and the long time spectrogram, it is shown that the reproduced speech of the proposed model is superior to that of the conventional one. In addition, through informal listening test we also confirm the superiority of the proposed model.