Humans infer a wide array of meanings from expressive nonverbal vocalizations, \eg laughs, cries, and sighs. Thus far, computational research has primarily focused on the coarse classification of vocalizations such as laughs, but that approach overlooks significant variations in the meaning of distinct laughs (e.g., amusement, awkwardness, triumph) and the rich array of more nuanced vocalizations people form. Nonverbal vocalizations are shaped by the emotional state an individual chooses to convey, their wellbeing, and (as with the voice more broadly) their identity-related traits. In the present study, we utilize a large-scale dataset comprising more than 35 hours of densely labeled vocal bursts to model emotionally expressive states and demographic traits from nonverbal vocalizations. We compare a single-task and multi-task deep learning architecture to explore how models can leverage acoustic co-dependencies that may exist between the expression of 10 emotions by vocal bursts and the demographic traits of the speaker. Results show that nonverbal vocalizations can be reliably leveraged to predict emotional expression, age, and country of origin. In a multi-task setting, our experiments show that joint learning of emotional expression and demographic traits appears to yield robust results, primarily beneficial for the classification of a speaker's country of origin.