DAVIS is a driver’s audio-visual assistive system intended to improve accuracy and robustness of speech recognition of the most frequent drivers’ requests in natural driving conditions. Since speech recognition in driving condition is highly challenging due to acoustic noises, active head turns, pose variation, distance to recording devices, lightning conditions, etc. We rely on multimodal information and use both automatic lip-reading system for visual stream and ASR for audio stream processing. We have trained audio and video models on own RUSAVIC dataset containing in-the-wild audio and video recordings of 20 drivers. The recognition application comprises a graphical user interface and modules for audio and video signal acquisition, analysis, and recognition. The obtained results demonstrate rather high performance of DAVIS and also the fundamental possibility of recognizing speech commands by using video modality, even in such difficult natural conditions as driving.