This paper describes our attempt to combine the relative merits of different indexing units (scales) and different retrieval models to improve performance in Chinese spoken document retrieval. Our study includes indexing units from three scales: words, character bigrams and syllable bigrams. We also include two different retrieval models: the HMM-based model and the vector space model (VSM). Our retrieval task is based on the TDT-2 Mandarin collection - news text is used to retrieve relevant Mandarin audio. We experimented with different scales and retrieval models. The HMM-based model retrieves better at the word scale (mAP=0.566). For the VSM, better performance is obtained at the character bigram scale (mAP=0.562). We proceeded with a series of integration experiments where the ranked retrieval lists from different runs are combined by rank-based rescoring. The best retrieval performance (mAP=0.591) is achieved when we integrate the HMM-word and VSM-character configurations. These results suggest that retrieval based on different scales and different models capture different kinds of knowledge, which can be integrated to improve retrieval performance.