We present a large-scale study on classification of linguistic and non-linguistic vocalizations including laughter, vocal noise, hesitation and consent on four corpora amounting to 46 hours of spontaneous conversational speech. We consider training and testing on speaker-independent subsets of single corpora (intra-corpus) as well as inter-corpus experiments where models built on one or more corpora are evaluated on a disjoint corpus. Our results reveal that while inter-corpus performance is considerably lower than comparable intra-corpus results, this effect can be countered by data agglomeration; furthermore, we observe that inter-corpus classification accuracies indicate suitability of corpora for building generalizing models.