We investigate and explore the interplay of credibility and expertise level in text and speech. We collect a unique domain-specific multimodal dataset and analyze a set of acoustic-prosodic and linguistic features in both credible and less credible speech by professionals of varying expertise levels. Our analyses shed light on potential indicators of domain-specific perceived credibility and expertise, as well as the interplay in-between. Moreover, we build multimodal and multi-task deep learning models that outperform human performance by 6.2% in credibility and 3.8% in expertise level, building upon state-of-the-art self-supervised pre-trained language models. To our knowledge, this is the first multimodal multi-task study that analyzes and predicts domain-specific credibility and expertise level at the same time.1