This work presents a novel framework for the automatic assessment of the unpleasantness caused by audio events to a human listener which is a relatively new research problem. Mel-frequency cepstral coefficients and temporal modulation parameters were employed to characterize 75 sound stimuli varying from animal calls to baby cries. The final assessment is made by means of a clustering scheme realized by Gaussian mixture models. The proposed framework leads to the best performance in terms of mean squared error and correlation between predicted and measured unpleasantness levels reported so far in the literature.