Despite significant advancements in speech recognition systems, challenges persist in accurately interpreting spontaneous speech from underrepresented groups like non-standard speakers or younger individuals. The difficulty increases when these conditions overlap. To further explore this topic, we employ a dataset featuring spontaneous as well as read speech from young speakers in Germany, including both, speakers from mono-ethnic and multi-ethnic backgrounds. Our study involves a comparative analysis of speech recognition performance, incorporating gender considerations, using three distinct Automatic Speech Recognition (ASR) engines: Whisper (OpenAI), NeMo (NVIDIA), and Wav2Vec2.0 (Meta AI). Furthermore, we conduct a comprehensive error analysis on the automatically generated transcripts, employing part-of-speech (POS) tagging. This allows us to discern the word types that pose the greatest challenge for comprehension by the ASR engines.