This paper aims to establish relationships between conversational markers and health outcomes using data from cardio-pulmonary rehabilitation sessions. Specifically, we used speech and text data from conversations between patients and researchers to assess exercise compliance and psychological wellbeing. We trained a Multimodal Transformer (MMT) on speech, transcript, and ground-truth labels. We further evaluate MMT's predictive performance by using session summaries generated by three Large Language Models (LLMs), which focused on dialogue characteristics (e.g., sentiment, thematic content, and future planning). Our findings establish the feasibility of augmenting speech and language processing of clinical sessions to improve decision-making and health outcomes.