ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Positional language modeling for extractive broadcast news speech summarization

Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu

Extractive summarization, with the intention of automatically selecting a set of representative sentences from a text (or spoken) document so as to concisely express the most important theme of the document, has been an active area of experimentation and development. A recent trend of research is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing extractive summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and estimate their parameters more accurately for each text (or spoken) document to be summarized. This paper extends this line of research and its contributions are three-fold. First, we propose a positional language modeling framework using different granularities of position-specific information to better estimate the sentence models involved in summarization. Second, we also explore to integrate the positional cues into relevance modeling through a pseudo-relevance feedback procedure. Third, the utilities of the various methods originated from our proposed framework and several well-established unsupervised methods are analyzed and compared extensively. Empirical evaluations conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization methods.