The objective and automated monitoring of depression using behavioral signals is confounded by the wide clinical profile of this commonly occurring mood disorder. This paper introduces Relevance Vector Machines, a novel method for predicting clinical depression scores from paralinguistic cues. It highlights many of the advantages RVM can offer depression prediction; sparsity, implicit noise characterization, an explicit probabilistic output and heterogeneous mapping property which allow one or more arbitrary, non-linear, transform to be used in conjunction with a RVM. Results indicate that RVMs can perform as strongly as Support Vector Regression in a brute-forcing paradigm. Of particular interest is the heterogeneous mapping property which improves RVM performance without requiring an expensive, in terms of data and time, search of the operating parameter space.