This paper describes the NAIST system for the NOTSOFAR-1 (Natural Office Talkers in Settings Of Far-field Audio Recordings) task of the CHiME-8 challenge. Although there is a critical need for real-time processing in everyday applications, most evaluations in the CHiME challenge focus solely on reducing word error rate. Here, we aim to reduce inference speed while improving recognition accuracy. To tackle this issue, we propose enhancing the modular architecture of the baseline by modifying both the CSS and ASR modules. Specifically, our ASR module was built on WavLM-large feature extractor and Zipformer transducer. Additionally, we employed block-wise weighted prediction error (WPE) for dereverberation before the speech separation module. Our system achieved a relative tcpWER reduction of 11.6% over the baseline system in the single-channel track and 18.7% in the multi-channel track. Moreover, our system is two to six times faster than the baseline system while achieving better tcpWER results.This paper presents the CHiME-8 DASR challenge which carries on from the previous edition CHiME-7 DASR (C7DASR) and the past CHiME-6 challenge. It focuses on joint multi-channel distant speech recognition (DASR) and diarization with one or more, possibly heterogeneous, devices. The main goal is to spur research towards meeting transcription approaches that can generalize across arbitrary number of speakers, diverse settings (formal vs. informal conversations), meeting duration, wide-variety of acoustic scenarios and different recording configurations. Novelties with respect to C7DASR include: i) the addition of NOTSOFAR-1, an additional office/corporate meeting scenario, ii) a manually corrected Mixer 6 development set, iii) a new track in which we allow the use of large-language models (LLM) iv) a jury award mechanism to encourage participants to explore also more practical and innovative solutions. To lower the entry barrier for participants, we provide a standalone toolkit1 for downloading and preparing such datasets as well as performing text normalization and scoring their submissions. Furthermore, this year we also provide two baseline systems, one directly inherited from C7DASR and based on ESPnet and another one developed on NeMo and based on NeMo’s team submission in last year C7DASR. Baseline system results suggest that the addition of the NOTSOFAR-1 scenario significantly increases the task’s difficulty due to its high number of speakers and very short duration.