Obstructive sleep apnea (OSA) is a condition commonly affecting middle-aged men that can disturb sleep, cause daytime tiredness, and increase the risk of heart disease. Speech can serve as a valuable biomarker for identifying and predicting the severity of OSA due to its connection with changes in throat structure. This study proposes a new deep-learning-based method for detecting OSA by analyzing speech recordings of participants in sitting and lying positions. The method utilizes a Siamese structure that employs a pre-trained XLSR model to encode ten utterances for each position, reducing the amount of necessary training data and enabling comparison of throat structure changes between the two positions through voice analysis. The study also explores the use of patient characteristic features. Results show this approach achieves an F1 value of 0.725 on our in-house dataset, proving the feasibility of end-to-end speech OSA detection with foundation models.