Low-dimensional representations have been shown to outperform their supervector counterparts in a variety of speaker recognition tasks. In this paper, we show that non-linear polynomial kernel support vector machines (SVMs) trained with low-dimensional representations almost halve the equal-error rate (EER) of the best performing SVMs trained with supervectors. Non-linear kernel SVMs implicitly transform the input features onto higher-dimensional spaces, a mechanism known to be generally effective when the number of instances is much larger than the feature dimension. Contrary to linear kernels, non-linear kernels exploit the dependencies among different input feature dimensions in the resulting high-dimensional spaces. Our experiments demonstrate that fifth-order polynomial kernel SVMs trained with low-dimensional representations reduce the EER by 56% relative when compared to standard linear SVMs trained with supervectors. They reduce the EER by 40% relative to the best performing SVMs trained with supervectors.
Index Terms: language recognition, support vector machines