Many speaking tests are conversational, dialogic, in form with an interlocutor talking to one or more candidates. This paper investigates how to automatically assess such a test. State-of-the-art approaches are used for a multi-stage pipeline: diarization and speaker assignment, to detect who is speaking and when; automatic speech recognition (ASR), to produce a transcript; and finally assessment. Each presents challenges which are investigated in the paper. Advanced foundation model-based auto-markers are examined: an ensemble of Longformer-based models that operates on the ASR output text; and a wav2vec2-based system that works directly on the audio. The two are combined to yield the final score. This fully automated system is evaluated in terms of ASR performance, and related impact of candidate assignment, as well as prediction of the candidate mark on data from the Occupational English Test. This is a conversational speaking test for L2 English healthcare professionals.