Abstract:Matching objects across multiple cameras with non-overlapping views is a necessary but difficult task in the wide area video surveillance. Owing to the lack of spatio-temporal information, only the visual information can be used in some scenarios, especially when the cameras are widely separated. This paper proposes a novel framework based on multi-feature fusion and incremental learning to match the objects across disjoint views in the absence of space–time cues. We first develop a competitive major feature histogram fusion representation (CMFH11CMFH is the abbreviation of Competitive Major Feature Histogram fusion representation.) to formulate the appearance model for characterizing the potentially matching objects. The appearances of the objects can change over time and hence the models should be continuously updated. We then adopt an improved incremental general multicategory support vector machine algorithm (IGMSVM22IGMSVM is the abbreviation of Incremental General Multicategory Support Vector Machine learning algorithm.) to update the appearance models online and match the objects based on a classification method. Only a small amount of samples are needed for building an accurate classification model in our method. Several tests are performed on CAVIAR, ISCAPS and VIPeR databases where the objects change significantly due to variations in the viewpoint, illumination and poses. Experimental results demonstrate the advantages of the proposed methodology in terms of computational efficiency, computation storage, and matching accuracy over that of other state-of-the-art classification-based matching approaches. The system developed in this research can be used in real-time video surveillance applications.

Combining Multi-Representation for Multimedia Event Detection Using Co-Training

Multimodal feature fusion for robust event detection in web videos

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching.

Multimedia Event Detection Using A Classifier-Specific Intermediate Representation

Searching Persuasively: Joint Event Detection And Evidence Recounting With Limited Supervision

Multimodal Sparse Coding for Event Detection

IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System

Bi-Level Semantic Representation Analysis for Multimedia Event Detection

Multimedia Event Detection and Recounting

Video object matching across multiple non-overlapping camera views based on multi-feature fusion and incremental learning.

Resource Constrained Multimedia Event Detection

Multimodal Deep Representation Learning for Video Classification

Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos

IBM Research and Columbia University TRECVID-2012 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), and Semantic Indexing (SIN) Systems.

Informedia@TRECVID 2013.

Informedia@ trecvid 2014 med and mer

Modality Mixture Projections for Semantic Video Event Detection

Multi-View Exclusive Unsupervised Dimension Reduction for Video-Based Facial Expression Recognition

Event-centric multi-modal fusion method for dense video captioning

Multi-Feature Fusion Via Hierarchical Regression for Multimedia Analysis