Despite the rapid advancement of automatic speech recognition (ASR) systems, spontaneous conversations still pose a major challenge, which is even more of an obstacle for low-resourced languages, dialects or non-dominant varieties. What is more, lively turn-changes in conversational speech cause short utterances that have been found to be error prone for transformer-based ASR systems, requiring larger context. The question thus arises which type of context is useful: rather more from the same speaker, providing acoustically relevant context, or more from the conversation - mixing utterances from both speakers - providing semantically relevant context. Comparing seven ASR systems on conversational Austrian German, we find the best performance with a minimum of 20s of context, independent of whether it was from the same or from the other speaker. Systems fine-tuned with data from the same variety and speaking style require less context and perform overall better than zero-shot systems.