ISCA Archive AVSP 2013
ISCA Archive AVSP 2013

Audio-visual interaction in sparse representation features for noise robust audio-visual speech recognition

Peng Shen, Satoshi Tamura, Satoru Hayamizu

In this paper, we investigate audio-visual interaction in sparse representation to obtain robust features for audio-visual speech recognition. Firstly, we introduce our system which uses sparse representation method for noise robust audio-visual speech recognition. Then, we introduce the dictionary matrix used in this paper, and consider the construction of audio-visual dictionary. Finally, we reformulate audio and visual signals as a group sparse representation problem in a combined featurespace domain, and then we improve the joint sparsity feature fusion method with the group sparse representation features and audio sparse representation features. The proposed methods are evaluated using CENSREC-1-AV database with both audio noise and visual noise. From the experimental results, we showed the effectiveness of our proposed method comparing with traditional methods.

Index Terms: sparse representation, audio-visual speech recognition, feature fusion, noise reduction, joint sparsity model