ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge

Xiong Xiao, Xiaohai Tian, Steven Du, Haihua Xu, Eng Siong Chng, Haizhou Li

Recent improvement in text-to-speech (TTS) and voice conversion (VC) techniques presents a threat to automatic speaker verification (ASV) systems. An attacker can use the TTS or VC systems to impersonate a target speaker's voice. To overcome such a challenge, we study the detection of such synthetic speech (called spoofing speech) in this paper. We propose to use high dimensional magnitude and phase based features and long term temporal information for the task. In total, 2 types of magnitude based features and 5 types of phase based features are used. For each feature type, we build a component system using a multilayer perceptron to predict the posterior probabilities of the input features extracted from spoofing speech. The probabilities of all component systems are averaged to produce the score for final decision. When tested on the ASVspoof 2015 benchmarking task, an equal error rate (EER) of 0.29% is obtained for known spoofing types, which demonstrates the highly effectiveness of the 7 features used. For unknown spoofing types, the EER is much higher at 5.23%, suggesting that future research should be focused on improving the generalization of the techniques.