The 2011 SESAME Multimedia Event Detection (MED) System

Murat Akbacak,Robert C. Bolles,J. Brian Burns,Mark Eliot,Aaron Heller,James A. Herson,Gregory Myers,Ramesh Nallapati,Eric Yeh,D.C. Koelma,Xirong Li,Masoud Mazloom,Chun Lee,Ram Nevatia,Pramod Sharma,Rémi Trichet
2011-01-01
Abstract:The SESAME team submitted four MED-11 runs which combined video content extraction results consisting of visual features, video OCR results, and motion features. The primary run and one of the secondary runs used two different methods of fusing visual features and OCR results; a third run combined visual features and motion features; and a fourth run combined visual features, OCR results, and motion features. Results were combined using rank-based fusion and weighted averages. We found that rank-based fusion of visual feature results and video OCR results (the primary run) had the best performance of the four runs. The initial performance of the runs with motion features, which were computed around keyframes, was poor, but a subsequent experiment showed that motion features can indeed contribute to improved performance.
What problem does this paper attempt to address?