In this paper, we present the key lessons learnt from numerous evaluations conducted to measure the quality of spoken and multimodal applications. The issues we address include the relation of laboratory and field studies, long-term and pilot evaluations, unimodality and multimodality, objective and subjective metrics, and user expectations and experiences. We present concrete case studies to discuss the above issues. For example, there are major differences in evaluating speech-only and multimodal systems. Similarly, there are major differences between laboratory and field studies, which need to be considered in successful evaluations.