In order to robustly recognize distorted speech, use of visual information has been proven valuable in many recent investigations. However, visual features may not always be available, and they can be unreliable in unfavorable recording conditions. The same is true for distorted audio information, where noise and interference can corrupt some of the acoustic speech features used for recognition. In this paper, missing feature techniques for coupled HMMs are shown to be successful in coping with both uncertain audio and video information. Since binary uncertainty information may be easily obtained at little computational effort, this results in an effective approach that can be implemented to obtain significant performance improvements for a wide range of statistical model based audiovisual recognition systems.
Index Terms: missing data techniques, audiovisual speech recognition, coupled HMM