ISCA Archive ISCSLP 2006
ISCA Archive ISCSLP 2006

Frame-level Nonlinearity for Robust DTW-based Speaker Verification

Jian Luan, Jie Hao, Tomonari Kakino, Tomonori Ikumi

Dynamic time warping (DTW) is a successful algorithm in many matching and searching tasks. For the text-dependent speaker verification, it is still an appropriate choice when enrollment data are very limited. Yet DTW is very sensitive to the endpoint variations between the reference template and test examples. Most research reported on this issue is mainly in two directions: robust endpoint detector and endpoint constraint relaxation. In this paper, we intend to propose the third possible solution by employing a frame-level nonlinear transform. The parameter for the transform function may be universal, template-dependent or frame-dependent. This method is also able to realize the normalization of DTW matching distance at the same time. Results indicate that the performance of text-dependent speaker verification can be enhanced remarkably in both clean and noisy environments. Their relative reductions of EER are 20.6% and 35.0% respectively. We expect the proposed method may be effective in other DTW applications as well. Keywords: speaker verification, dynamic time warping, text-dependent, framelevel nonlinearity