Vocal tract magnetic resonance imaging (MRI) has become one of the preferred imaging modalities for the analysis of human speech production. However, the raw image data must be segmented before further analysis can take place. This paper describes a hybrid approach to extract a 3D tongue model from 3D or 2D MRI scans of the vocal tract during speech, which combines unsupervised image segmentation with a mesh deformation technique. An efficient, minimally supervised segmentation algorithm can also be used as an alternative to provide a robust fallback in certain isolated cases. Both image segmentation algorithms produce a point cloud, which is completed and registered by deforming a template mesh to the data. Since the mesh deformation can be applied even with a sparse point cloud, it is possible to extract realistic 3D tongue shapes even from the 2D video frames of real-time MRI. Our approach is applied to several sets of available MRI data and yields promising results.