Multimedia Evidence Fusion for Video Concept Detection Via OWA Operator.

Ming Li,Yan-Tao Zheng,Shou-Xun Lin,Yong-Dong Zhang,Tat-Seng Chua
DOI: https://doi.org/10.1007/978-3-540-92892-8_21
2009-01-01
Abstract:We present a novel multi-modal evidence fusion method for high-level feature (HLF) detection in videos. The uni-modal features, such as color histogram, transcript texts, etc, tend to capture different aspects of HLFs and hence share complementariness and redundancy in modeling the contents of such HLFs. We argue that such inter-relation are key to effective multi-modal fusion. Here, we formulate the fusion as a multi-criteria group decision making task, in which the uni-modal detectors are coordinated for a consensus final detection decision, based on their inter-relations. Specifically, we mine the complementariness and redundancy inter-relation of uni-modal detectors using the Ordered Weighted Average (OWA) operator. The 'or-ness' measure in OWA models the inter-relation of uni-modal detectors as combination of pure complementariness and pure redundancy. The resulting weights of OWA can then yield a consensus fusion, by optimally leveraging the decisions of uni-modal detectors. The experiments on TRECVID 07 dataset show that the proposed OWA aggregation operator can significantly outperform other fusion methods, by achieving a state-of-art MAP of 0.132.
What problem does this paper attempt to address?