We present results of a series of machine learning experiments that address the classification of the discourse function of single affirmative cue words such as alright, okay and mm-hm in a spoken dialogue corpus. We suggest that a simple discourse/sentential distinction is not sufficient for such words and propose two additional classification sub-tasks: identifying (a) whether such words convey acknowledgment or agreement, and (b) whether they cue the beginning or end of a discourse segment. We also study the classification of each individual word into its most common discourse functions. We show that models based on contextual features extracted from the time-aligned transcripts approach the error rate of trained human aligners.