Abstract:Keypoint detection and descriptor matching are two vital steps in the 3D feature extraction framework, but they are difficult to learn in an end-to-end fashion due to their inherent discreteness. To tackle the non-differentiable operations, we formulate feature extraction as a decision-making problem: the network is treated as a policy pool that can make probabilistic estimations for keypoint selection and feature matching, supervised by maximizing a reward expectation of actions. In this way, we propose a novel end-to-end training paradigm of 3D feature extraction based on the stochastic policy gradient method, named Reinforced Detectors and Descriptors (RDD). Firstly, we propose a local-to-global probabilistic keypoint selection module that formulates the sampling probabilities of keypoints in a local-and-global mechanism to yield sparse and accurate keypoints. Secondly, we regard feature matching as an optimal transport problem and an efficient Sinkhorn method is leveraged to solve the optimal matching probabilities. In particular, we carefully design a reward function and derive gradients of probabilistic actions, thus overcoming the discreteness and providing reinforced supervision signals. Since our reward function is calculated from sampled keypoints rather than from randomly sampled points as in existing methods, the gap between training and inference is bridged. Experimental results demonstrate that our approach exceeds the quality of state-of-the-art methods and shows strong generalization ability. Remarkably, our approach can achieve significantly higher Registration Recall than other advanced methods when aligning scenes with a small number of keypoints, due to our highly accurate and repeatable detector.

Deep Metric Tensor Regularized Policy Gradient

Hessian Aided Policy Gradient

Regularized Policy Gradients: Direct Variance Reduction in Policy Gradient Estimation.

Stochastic Cubic-Regularized Policy Gradient Method

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

A Closer Look at Deep Policy Gradients

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Identifying Policy Gradient Subspaces

Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit

Decentralized Multi-Task Reinforcement Learning Policy Gradient Method with Momentum over Networks.

Generalization to the Natural Gradient Descent

Policy ensemble gradient for continuous control problems in deep reinforcement learning

High-dimensional continuous control using generalized advantage estimation

On the Linear Convergence of Natural Policy Gradient Algorithm

RDD: Learning Reinforced 3D Detectors and Descriptors Based on Policy Gradient

Elementary Analysis of Policy Gradient Methods

Decentralized TD(0) with Gradient Tracking