ISCA Archive AVSP 2009
ISCA Archive AVSP 2009

Space-time audio-visual speech recognition with multiple multi-class probabilistic support vector machines

Samuel Pachoud, Shaogang Gong, Andrea Cavallaro

We extract relevant and informative audio-visual features using multiple multi-class Support Vector Machines with probabilistic outputs, and demonstrate the approach in a noisy audio-visual speech reading scenario. We first extract visual spatio-temporal features and audio cepstral coefficients from pronounced digit sequences. Two classifiers are then trained on a single modality to obtain confidence factors that are used to select the most appropriate fusion strategy. A final classifier is trained on the joint audiovisual feature space and used to recognize digits. We demonstrate the proposed approach on a standard database and compare it with alternative methods. The evaluation shows that the proposed approach outperforms the alternatives both in terms of recognition accuracy and in terms of robustness.