Abstract:Video analysis has been attracting increasing research due to the proliferation of internet videos. In this paper, we investigate how to improve the performance on internet quality video analysis. Particularly, we work on the scenario of few labeled training videos being provided, which is less focused in multimedia. To being with, we consider how to more effectively harness the evidences from the low-level features. Researchers have developed several promising features to represent videos to capture the semantic information. However, as videos usually characterize rich semantic contents, the analysis performance by using one single feature is potentially limited. Simply combining multiple features through early fusion or late fusion to incorporate more informative cues is doable but not optimal due to the heterogeneity and different predicting capability of these features. For better exploitation of multiple features, we propose to mine the importance of different features and cast it into the learning of the classification model. Our method is based on multiple graphs from different features and uses the Riemannian metric to evaluate the feature importance. On the other hand, to be able to use limited labeled training videos for a respectable accuracy we formulate our method in a semi-supervised way. The main contribution of this paper is a novel scheme of evaluating the feature importance that is further casted into a unified framework of harnessing multiple weighted features with limited labeled training videos. We perform extensive experiments on video action recognition and multimedia event recognition and the comparison to other state-of-the-art multi-feature learning algorithms has validated the efficacy of our framework.

Feature Weighting via Optimal Thresholding for Video Analysis

Feature Weighting Via Optimal Thresholding for Video Analysis (open Access)

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Fusion of infrared and visual images through multiscale hybrid unidirectional total variation

Multimodal feature fusion for robust event detection in web videos

Intensity/Inertial Integration-Aided Feature Tracking on Event Cameras

Small Low-Contrast Target Detection: Data-Driven Spatiotemporal Feature Fusion and Implementation

Infrared and Radar Fusion Detection Method Based on Heterogeneous Data Preprocessing

Detail enhanced multi-source fusion using visual weight map extraction based on multi scale edge preserving decomposition

Multimedia Evidence Fusion for Video Concept Detection Via OWA Operator.

Multiple Feature Fusion Via Weighted Entropy for Visual Tracking

Resource Constrained Multimedia Event Detection

Infrared and Visible Image Fusion Using Threshold Segmentation and Weight Optimization

Exploring fusion strategies for accurate RGBT visual object tracking

An Improved C-COT Based Visual Tracking Scheme to Weighted Fusion of Diverse Features

Dynamic Multimodal Fusion in Video Search

Multiple Features but Few Labels?

Adaptive Feature Aggregation for Video Object Detection

Fusion Detection via Distance-Decay Intersection over Union and Weighted Dempster-Shafer Evidence Theory

Video Event Detection Using Motion Relativity and Feature Selection

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval