ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

DAVIS: Driver’s Audio-Visual Speech recognition

Denis Ivanko, Dmitry Ryumin, Alexey Kashevnik, Alexandr Axyonov, Andrey Kitenko, Igor Lashkov, Alexey Karpov

DAVIS is a driver’s audio-visual assistive system intended to  improve accuracy and robustness of speech recognition of the  most frequent drivers’ requests in natural driving conditions.  Since speech recognition in driving condition is highly  challenging due to acoustic noises, active head turns, pose  variation, distance to recording devices, lightning conditions,  etc. We rely on multimodal information and use both automatic  lip-reading system for visual stream and ASR for audio stream processing. We have trained audio and video models on own  RUSAVIC dataset containing in-the-wild audio and video recordings of 20 drivers. The recognition application comprises  a graphical user interface and modules for audio and video  signal acquisition, analysis, and recognition. The obtained  results demonstrate rather high performance of DAVIS and also  the fundamental possibility of recognizing speech commands  by using video modality, even in such difficult natural  conditions as driving.