This study focuses on handling high-dimensional classification problems by means of feature selection. The data sets used are provided by the organizers of the Interspeech 2012 Speaker Trait Challenge. A combination of two feature selection approaches gives results that approach or exceed the challenge baselines using a k-nearest-neighbor classifier. One of the feature selection methods is based on covering the data set with correct unsupervised or supervised classifications according to individual features. The other selection method applies a measure of statistical dependence between discretized features and class labels.
Index Terms: pattern recognition, feature selection, high-dimensional data, speaker characteristics