Social annotations such as Yahoo! Answers already define broad coverage of hierarchical topic categories and include millions of documents annotated by web users. This paper argues that topic language model (LM) adaptation via effective leveraging of such social annotations, while possibly noisy, may be more effective than unsupervised methods such as clustering-based and LDA-based algorithms. Experimental results on the IWSLT-2011 TED ASR data sets demonstrate that we can achieve modest improvements when compared with the unsupervised methods.
Index Terms: Topic language model adaptation, social annotations, speech recognition.