Abstract:Despite its significant progress, cross-modal retrieval still suffers from one-to-many matching cases, where the multiplicity of semantic instances in another modality could be acquired by a given query. However, existing approaches usually map heterogeneous data into the learned space as deterministic point vectors. In spite of their remarkable performance in matching the most similar instance, such deterministic point embedding suffers from the insufficient representation of rich semantics in one-to-many correspondence. To address the limitations, we intuitively extend a deterministic point into a closed geometry and develop geometric representation learning methods for cross-modal retrieval. Thus, a set of points inside such a geometry could be semantically related to many candidates, and we could effectively capture the semantic uncertainty. We then introduce two types of geometric matching for one-to-many correspondence, i.e., point-to-rectangle matching (dubbed P2RM) and rectangle-to-rectangle matching (termed R2RM). The former treats all retrieved candidates as rectangles with zero volume (equivalent to points) and the query as a box, while the latter encodes all heterogeneous data into rectangles. Therefore, we could evaluate semantic similarity among heterogeneous data by the Euclidean distance from a point to a rectangle or the volume of intersection between two rectangles. Additionally, both strategies could be easily employed for off-the-self approaches and further improve the retrieval performance of baselines. Under various evaluation metrics, extensive experiments and ablation studies on several commonly used datasets, two for image-text matching and two for video-text retrieval, demonstrate our effectiveness and superiority.

Plane Geometry Figure Retrieval Based on Bilayer Geometric Attributed Graph Matching

Plane Geometry Figure Retrieval with Bag of Shapes

Structure Analysis For Plane Geometry Figures

Improving retrieval of plane geometry figure with learning to rank.

BHoG: Binary Descriptor for Sketch-Based Image Retrieval

Shape Retrieval Method of 3D Models Based on Shape Distribution Graph and BP Neural Network

Improving PGF Retrieval Effectiveness with Active Learning.

Analysis of Stroke Intersection for Overlapping PGF Elements

A Diagram Retrieval Method With Multi-Label Learning

Multi-layered Geometry Image Representation of Point Cloud Surfaces

Detection of Overlapped Quadrangles in Plane Geometric Figures

PAGML: Precise Alignment Guided Metric Learning for sketch-based 3D shape retrieval

G$^3$-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding

An Unified CGA-Based Formal Expression of Spatio-Temporal Topological Relations for Computation and Analysis of Geographic Objects

Overlapped-Triangle Analysis With Hierarchical Ranking Of Dominance

Graph Geometry Interaction Learning

Geometry-Based Feature Selection and Deep Aggregation Model for Architectural Scenery Recomposition Toward Education

PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems

Geometric Matching for Cross-Modal Retrieval

An Online Composite Graphics Recognition Approach Based on Matching of Spatial Relation Graphs.

Layered Graph Match with Graph Editing