Supervised training of end-to-end speech recognition systems usually requires large amounts of transcribed speech data to achieve reasonable performance. This hinders its application to problems where the availability of annotated data is low since manual labeling is costly. Often, however, large amounts of speech data with imperfect transcriptions are available, which can be automatically aligned to generate noisy labels. In this work, we compare how supervised learning on noisy data from forced alignment compares to semi-supervised learning and self-supervised representation learning. The latter two have shown great success in improving speech recognition using unlabeled data. We employ noisy student training for semi-supervised learning and wav2vec 2.0 for self-supervised representation learning. We compare these methods on 2324 hours of Swiss German audio with automatically aligned Standard German text. Using speech data with noisy labels for supervised learning leads to a word error rate (WER) of 26.4% on our test set. Using the same data for wav2vec pretraining leads to a WER of 27.8%. With noisy student training, we achieve a WER of 30.3%.