We investigate a discrimination method for invalid and valid spoken inquiries, received by a speech-oriented guidance system operating in a real environment. Invalid spoken inquiries include background voices, which are not directly uttered to the system, and nonsense utterances. Such spoken inquiries should be rejected beforehand. By now, we have reported a method using the likelihood values of Gaussian mixture models (GMMs) to discriminate invalid spoken inquiries from valid ones. In this paper, we improve the performance by utilizing not only the likelihood values but also other information in spoken inquiries such as bag-of-words (BOW), utterance duration, and signal-to-noise ratio (SNR). To deal with these multiple information, we use support vector machine (SVM) with radial basis function (RBF) kernel and maximum entropy (ME) method and compare the performance. In the experiments, we achieve 86.6% of F-measure for SVM and 84.2% for ME, while F-measure for GMM-based method is 81.7%.
Index Terms: speech-oriented guidance system, spoken inquiry discrimination, support vector machine, maximum entropy, bag-of-words