MRI recordings of the vocal tract allow researchers to obtain anatomical cross-sections in a non-invasive way, providing an important tool for speech production research. Acquiring MRI at equally high temporal and spatial resolution remains, however, challenging. We propose an image processing method for synthesising a real-time high spatial resolution 3D movie given real-time 2D MRI and static high spatial resolution 3D MRI data from the same speaker. We evaluate our method on a public dataset with 17 speakers, showing that a real-time 2D movie of the vocal tract during a speech task can be encoded by combinations of a small number of its frames. These combinations can be transferred to the domain of the high spatial resolution 3D data with static vocal tract articulations matched to frames of the 2D movie, synthesising a 3D movie of the speech task. Our algorithmic method provides a generic approach that can complement technical improvements of the acquisition process.