Speech Sound Disorders (SSD) are common among children, affecting their academic, social, and emotional development. Traditional diagnostic methods are based on speech-language pathologists, making them resource intensive. Due to the global shortage of experts and increasing demand, exploring deep-learning tools is crucial. Adapting a multi-task framework to fine-tune a pre-trained multilingual Wav2Vec model, this study tackles Automatic Speech Recognition and SSD classification for German children using a custom dataset. We show that incorporating public out-of-domain datasets improves robustness and generalizability. Interestingly, combining pathological and typical speech data with mis-pronunciations benefits the performance in terms of speech recognition and SSD detection. Finally we investigate a two-step training of the model that further improves the overall performance.