Recent work on directional automatic speech recognition (DASR) has enabled automatic transcription of a conversation partner several feet away via smart glasses. DASR leverages multiple microphones in the glasses by using multiple beamformers simultaneously. We aim to make the DASR insensitive to scenarios that also involve text-to-speech (TTS) playback. This could enable additional future scenarios like simultaneous speech translation. How to prevent ASR from capturing the system's own TTS output, while maintaining optimal clarity of the captured conversation partner's speech? We experiment with two modern linear acoustic echo cancellation (AEC) algorithms. To remedy accuracy regressions from echo residuals, we propose AEC-aware model training. While AEC alone eliminates most TTS loopback, dramatically improving the word-error rate (WER) by over 70%, AEC-aware model training provides further relative WER boosts of 13% or more.