In this paper, we consider a spoken question answering (QA) task, in which the questions are in form of speech, while the knowledge source for answers are the webpages (in text) over the Internet to be accessed by an information retrieval engine, and we mainly focus on query formulation and re-ranking part. Because the recognition results for the spoken questions are less reliable, we use N-best lists in order to have higher probabilities to induce more correct keywords for the questions, but more noisy words are inevitably included as well. We therefore propose a hierarchical labeling method using tree-structured conditional random fields (CRF) to leverage the parse tree information or the syntactic structure obtained from the N-best-lists of the spoken questions, such that the queries for information retrieval can be better formulated. In addition, because queries formulated from the N-best results naturally generate more noisy information, we further propose to use two-layer random walk for re-ranking the retrieved webpages to produce better documents containing answers. Initial experiments performed on a set of question answering pairs in Mandarin Chinese verified that improved performance was achievable with the proposed approaches.