Since disfluencies are frequent in conversational speech, they have received notable attention: from speech technologists to make automatic speech recognition (ASR) more robust and from speech scientists to learn more about human speech processing. For ASR, the most established quality measure is the word error rate (WER), while for human recognition, one of the measures is the recall of words or utterance-level semantics. We conduct a transcription experiment in which we present the same disfluent utterances to 54 participants and nine ASR systems. We analyse which factors affect transcription in the context of syntactic disfluencies and filler particles, including well-known factors such as pronunciation variation and articulation rate. We find that, surprisingly, both humans and ASR struggle with largely the same characteristics of conversational speech – despite their mean WERs differing by about 10% – and that the presence or absence of filler particles does not affect the WER.