Abstract:Video semantic recognition usually suffers from the curse of dimensionality and the absence of enough high-quality labeled instances, thus semisupervised feature selection gains increasing attentions for its efficiency and comprehensibility. Most of the previous methods assume that videos with close distance (neighbors) have similar labels and characterize the intrinsic local structure through a predetermined graph of both labeled and unlabeled data. However, besides the parameter tuning problem underlying the construction of the graph, the affinity measurement in the original feature space usually suffers from the curse of dimensionality. Additionally, the predetermined graph separates itself from the procedure of feature selection, which might lead to downgraded performance for video semantic recognition. In this paper, we exploit a novel semisupervised feature selection method from a new perspective. The primary assumption underlying our model is that the instances with similar labels should have a larger probability of being neighbors. Instead of using a predetermined similarity graph, we incorporate the exploration of the local structure into the procedure of joint feature selection so as to learn the optimal graph simultaneously. Moreover, an adaptive loss function is exploited to measure the label fitness, which significantly enhances model's robustness to videos with a small or substantial loss. We propose an efficient alternating optimization algorithm to solve the proposed challenging problem, together with analyses on its convergence and computational complexity in theory. Finally, extensive experimental results on benchmark datasets illustrate the effectiveness and superiority of the proposed approach on video semantic recognition related tasks.

A Statistics-Based Method For Video Semantic Analysis

An Unsupervised Video Summarization Method Based on Multimodal Representation.

Multimodal Salient Objects: General Building Blocks Of Semantic Video Concepts

An HMM-based framework for video semantic analysis

Research on Semantic Extraction of Online Mathematics Guidance Videos in Colleges and Universities Based on Intelligent Information Technology

Semantic Content Mining Approach in Video Based on Vector Space Model

Video Structural Description: A Semantic Based Model for Representing and Organizing Video Surveillance Big Data

Non-rigid Video Object Segmentation Based on Semantic Multi-level Framework.

Video News Indexing Using Semantic-Face

Video Semantic Concept Detection Using Multi-Modality Subspace Correlation Propagation

Video structural description technology for the new generation video surveillance systems

Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective

An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Bi-Level Semantic Representation Analysis for Multimedia Event Detection

Semantic based representing and organizing surveillance big data using video structural description technology

Extracting Multimedia Semantics Based On Independent Modality Discovering And Fusion

Content-Based Video Browsing by Text Region Localization and Classification

Scene Segmentation Based on Video Structure and Spectral Methods

A Fusion Scheme of Visual and Auditory Modalities for Event Detection in Sports Video.