Multimedia Event Detection (MED) is an annual task in the NIST TRECVID evaluation, and requires participants to build indexing and retrieval systems for locating videos in which certain predefined events are shown. Typical systems focus heavily on the use of visual data. Audio data, however, also contains rich information that can be effectively used for video retrieval, and MED could benefit from the attention of researchers in audio analysis. We present several systems for performing MED using only audio data, report the results of each system on the TRECVID MED 2011 development dataset, and compare the strengths and weaknesses of each approach.
Index Terms: multimedia event detection, audio processing, video retrieval