Non-Audible Murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker's ear. In a NAM microphone, body transmission and loss of lip radiation act as a low-pass filter. Consequently, higher frequency components are attenuated in a NAM signal. Owing to such factors as spectral reduction, the unvoiced nature of NAM, and the type of articulation, the NAM sounds become similar, thereby causing a larger number of confusions in comparison to normal speech. In the present article, the visual information extracted from the talker's facial movements is fused with NAM speech using three fusion methods, and phoneme classification experiments are conducted. The experimental results reveal a significant improvement when both fused NAM speech and facial information are used.