ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Multimodal speech recognition with ultrasonic sensors

Bo Zhu, Timothy J. Hazen, James Glass

In this research we explore multimodal speech recognition by augmenting acoustic information with that obtained by an ultrasonic emitter and receiver. After designing a hardware component to generate a stereo audio/ultrasound signal, we extract sub-band ultrasonic features that supplement conventional MFCC-based audio measurements. A simple interpolation method is used to combine audio and ultrasound model likelihoods. Experiments performed on a noisy continuous digit recognition task indicate that the addition of ultrasonic information reduces word error rates by 24-29% over a wide range of acoustic SNR (20-0 dB).