Abstract:Recovering camera pose from two-view images is a critical problem in photogrammetry and computer vision. For complex scenarios, point correspondences that are constructed by off-the-shelf feature matcher such as SIFT, would be corrupted by heavy outliers. In this case, traditional sampling consensus- or motion/geometrical coherence-based methods would suffer a lot from ensuring their assumptions. To this end, we propose a deep technique to better extract underlying geometry information from high-dimensional feature space for two-view geometry estimation. Unlike existing deep methods that use distribution-based normalization or explicitly aggregate neighboring correspondences, we propose a graph attention operation with multi-head mechanism, termed as GANet, to latently capture fine-grain contextual/geometrical relations among these corrupted correspondences. This encourages our network to learn informative representation for ensuring high graph similarity thus focusing more on inliers and restraining outliers. On this basis, our network can more easily infer inliers that are best to recover camera pose. Moreover, we also observe that the calculation of graph similarity for each node is only supported by partial node features. In this regard, we further propose a lightweight implementation for graph attention, namely Sparse GANet, which is performed by learning a sparse attention map based on block-wise operation and Sinkhorn normalization. This sparse strategy can largely reduce the memory and computational requests while maintaining the performance. Extensive experiments of pose estimation, outlier rejection and image registration on different challenging datasets, and combinational tests with different descriptor matchers and robust estimators, demonstrate the superiority and great generalization of our method against the state-of-the-art. In particular, we achieve at least 1.5% and 0.6% mAP(%)@5° enhancement on YFCC and SUN3D data for pose estimation, respectively. And our sparse GANet can reduce the model size to only 0.28 MB and the time cost to 16 ms, which is significant superior than SuperGlue that requires 12.02 MB and 68 ms. (Source code is available at https://github.com/StaRainJ/Code-of-GANet.)

CA-GAN: Object Placement Via Coalescing Attention Based Generative Adversarial Network.

OAW-GAN: Occlusion-Aware Warping GAN for Unified Human Video Synthesis

Unpaired Salient Object Translation Via Spatial Attention Prior

OA-GAN: Organ-Aware Generative Adversarial Network for Synthesizing Contrast-Enhanced Medical Images

Two Birds with One Stone: Iteratively Learn Facial Attributes with GANs.

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Two Birds with One Stone: Transforming and Generating Facial Images with Iterative GAN

Customizable GAN: Customizable Image Synthesis Based on Adversarial Learning.

Object-driven Text-to-Image Synthesis via Adversarial Training

Spatial Fusion GAN for Image Synthesis

Interactive Image Synthesis with Panoptic Layout Generation

Aggregated Contextual Transformations for High-Resolution Image Inpainting

Learning for mismatch removal via graph attention networks

Domain adaptive person search via GAN-based scene synthesis for cross-scene videos

A Generative Adversarial Framework for Optimizing Image Matting and Harmonization Simultaneously

FBC-GAN: Diverse and Flexible Image Synthesis via Foreground-Background Composition

ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks

Dual Attention GANs for Semantic Image Synthesis

Pose Generator ( G ) : Head : R arm : L arm : Chest : R leg : L leg Plausible Pose

Spatial Content Alignment For Pose Transfer