ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Topic detection in broadcast news

Frederick Walls, Hubert Jin, Sreenivasa Sista, Richard Schwartz

We propose a system for the Topic Detection and Tracking (TDT) detection task concerned with the unsupervised grouping of news stories according to topic. We use an incremental k -means algorithm for clustering stories. For comparing stories, we utilize a probabilistic document similarity metric and a traditional vector-space metric. We note that that the clustering algorithm requires two different types of metrics and adapt similarity metrics for each purpose. The system achieves a topic-weighted miss rate of 12% at a false accept rate of 0.22%.