Abstract:Multispectral images (MSIs) have widespread applications, and efficiently managing these extensive MSIs via remote sensing image retrieval (RSIR) is key to boosting their practical value. While current deep learning-based methods offer strong image representation learning capabilities, adapting to complex and dynamic relationships between objects and spectral information in MSIs remains challenging. This difficulty arises due to the distinct attributes of different spectral bands and the lack of consideration of interactions among spectral combinations in MSIs, which limits their retrieval performance. For this purpose, we propose a dynamic learning system inspired by human communication named the semantic–view collaborative network (SVCNet), which actively promotes the interaction between spectral and semantic information. By linking multiview learning (MVL) with graph neural networks (GNNs) to simulate the three stages of human communication—understanding, communication, and collective consensus and reflection—SVCNet enhances RSIR with flexibility in representation extraction. Specifically, each spectral combination is processed to extract independent representations as view-specific knowledge. In the communication phase, we devise the graph attention-based multiround communication module (GACM), which uses GNN to perform graph-structured modeling and adaptive updating of views and semantics. Moreover, we achieve improved MSI representations by implementing novel objective functions that align learned semantics with category information, dynamically differentiating semantic similarities and disparities in MSIs, and flexibly weighting samples for enhanced adaptability in a multilabel RSIR environment. SVCNet surpasses current state-of-the-art methods in three MSI datasets for single and multilabel retrieval tasks. It effectively handles class imbalances and distinguishes challenging samples, highlighting its extensive applicability.

Learning Socially Embedded Visual Representation from Scratch

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Multimodal Learning of Social Image Representation by Exploiting Social Relations

Unsupervised Teacher-Student Model for Large-Scale Video Retrieval.

Exploring Visual Engagement Signals for Representation Learning

Learning a Self-Expressive Network for Subspace Clustering

Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking

Social-Sensed Image Search.

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

Collaborative Feature Learning from Social Media

Learning Robust Visual-Semantic Embeddings

CEIR: Concept-based Explainable Image Representation Learning

Social Visual Image Ranking for Web Image Search.

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Social Embedding Image Distance Learning

Social-oriented Visual Image Search

Adaptive Learning on User Segmentation: Universal to Specific Representation via Bipartite Neural Interaction

COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation

Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval

D-Sempre: Learning Deep Semantic-Preserving Embeddings for User interests-Social Contents Modeling

Human Communication-Inspired Semantic–View Collaborative Network for Multispectral Remote Sensing Image Retrieval