Abstract:Deep semantic matching aims at discriminating the relationship between documents based on deep neural networks. In recent years, it becomes increasingly popular to organize documents with a graph structure, then leverage both the intrinsic document features and the extrinsic neighbor features to derive discrimination. Most of the existing works mainly care about how to utilize the presented neighbors, whereas limited effort is made to filter appropriate neighbors. We argue that the neighbor features could be highly noisy and partially useful. Thus, a lack of effective neighbor selection will not only incur a great deal of unnecessary computation cost but also restrict the matching accuracy severely. In this work, we propose a novel framework, C ascaded D eep S emantic M atching ( CDSM ), for accurate and efficient semantic matching on textual graphs. CDSM is highlighted for its two-stage workflow. In the first stage, a lightweight CNN-based ad-hod neighbor selector is deployed to filter useful neighbors for the matching task with a small computation cost. We design both one-step and multi-step selection methods. In the second stage, a high-capacity graph-based matching network is employed to compute fine-grained relevance scores based on the well-selected neighbors. It is worth noting that CDSM is a generic framework which accommodates most of the mainstream graph-based semantic matching networks. The major challenge is how the selector can learn to discriminate the neighbors’ usefulness which has no explicit labels. To cope with this problem, we design a weak-supervision strategy for optimization, where we train the graph-based matching network at first and then the ad-hoc neighbor selector is learned on top of the annotations from the matching network. We conduct extensive experiments with three large-scale datasets, showing that CDSM notably improves the semantic matching accuracy and efficiency thanks to the selection of high-quality neighbors. The source code is released at https://github.com/jingjyyao/CDSM.

Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model

Learning Visually Aligned Semantic Graph for Cross-Modal Manifold Matching.

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Learning Dual Semantic Relations with Graph Attention for Image-Text Matching

Few-Shot Medical Image Segmentation via Generating Multiple Representative Descriptors

Cross-modality interaction for few-shot multispectral object detection with semantic knowledge

Cross-Modal Attention With Semantic Consistence for Image–Text Matching

Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Dma-Net: Decoupled Multi-Scale Attention for Few-Shot Object Detection

Semantic Enhanced Few-shot Object Detection

Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching

CDSM: C ascaded D eep S emantic M atching on Textual Graphs Leveraging Ad-hoc Neighbor Selection

Masked Cross-image Encoding for Few-shot Segmentation

Few-Shot Object Detection with Memory Contrastive Proposal Based on Semantic Priors

SGMNet: Scene Graph Matching Network for Few-Shot Remote Sensing Scene Classification

Bridging the gap: dual perception attention and local-global similarity fusion for cross-modal image-text matching

Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation

SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

MetaMask: Improving Few-Shot Semantic Segmentation Via Multi-Mask Calibriation

Disentangled Foreground-Semantic Adapter Network for Generalized Aerial Image Few-Shot Semantic Segmentation