The prosody of speech is closely related to syntactic structure of the spoken sentence, and thus analysis models that jointly consider these two types of information are promising. However, manual annotation of syntactic information and prosodic information such as pauses is laborious, and thus it can be difficult to obtain sufficient data to train such joint models. In this paper, we tackle this problem by introducing a joint pause prediction and dependency parsing model that treats pauses between consecutive words as latent variables. Using this model, it is possible to learn from not only data labeled with both syntax and pause information, but also data labeled with only syntactic information, which can be obtained in larger quantities. Experiments find that a joint pause prediction and dependency parsing model obtains better pause prediction F-measure than a decision-tree-based baseline trained on the same data, and that the addition of more data using the proposed latent variable model leads for further gains of up to 11.6 points in F-measure.