Abstract:Although the recent progress in the deep neural network has led to the development of learnable local feature descriptors, there is no explicit answer for estimation of the necessary size of a neural network. Specifically, the local feature is represented in a low dimensional space, so the neural network should have more compact structure. The small networks required for local feature descriptor learning may be sensitive to initial conditions and learning parameters and more likely to become trapped in local minima. In order to address the above problem, we introduce an adaptive pruning Siamese Architecture based on neuron activation to learn local feature descriptors, making the network more computationally efficient with an improved recognition rate over more complex networks. Our experiments demonstrate that our learned local feature descriptors outperform the state-of-art methods in patch matching.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively estimate the necessary scale of the neural network used to learn local feature descriptors, and overcome the problems that small - sized networks are prone to fall into local minima during the training process and are sensitive to initial conditions and learning parameters. Specifically, local features are usually represented in a low - dimensional space, so the required neural network structure should be more compact. However, small - sized networks may, due to their structural characteristics, be more affected by initial conditions and learning parameters during training and are more likely to fall into local optimal solutions. To solve the above problems, the author introduced an adaptive pruning Siamese architecture based on neuron activation (Siamese Network), aiming to learn local feature descriptors by iteratively removing unimportant neurons, thereby making the network more computationally efficient and improving the recognition rate. Experimental results show that the proposed method not only makes the network structure more compact but also performs better than existing methods on the patch - matching task. ### Main Contributions 1. **Adaptive Pruning Siamese Architecture**: Using the neuron activation frequency as an indicator, iteratively remove unimportant neurons, and finally obtain a compact and efficient network. 2. **Improved Local Feature Descriptor Learning Strategy**: Based on the Optimal Brain Damage (OBD) theory, optimize the network structure through pruning techniques and improve the quality of feature descriptors. 3. **Experimental Verification**: Conducted a large number of experiments on the UBC patch data set, proving the effectiveness and superiority of the proposed method. ### Method Overview - **Siamese Architecture**: Consists of two identical sub - networks and a loss module. The input is a pair of images and their labels, and the output is scalar energy. - **Contrastive Loss Function**: Used to train the network, minimizing the distance between matching pairs and maximizing the distance between non - matching pairs. \[ L(N)=\frac{1}{2}\left(y^{(i)}\left\|f\left(x_{1}^{(i)}\right)-f\left(x_{2}^{(i)}\right)\right\|^{2}+(1 - y^{(i)})\max\left(0,m-\left\|f\left(x_{1}^{(i)}\right)-f\left(x_{2}^{(i)}\right)\right\|\right)^{2}\right) \] where \(f\left(x_{1}^{(i)}\right)\) and \(f\left(x_{2}^{(i)}\right)\) are the feature descriptors of the \(i\) - th pair of images respectively, \(y^{(i)}\) is a binary label indicating whether the pair of images match, and \(m\) is a margin parameter. - **Adaptive Pruning**: Calculate the activation frequency of neurons, and gradually remove those neurons whose activation frequency is less than 1% and their connections until the network converges. ### Experimental Results Experimental results show that the network after adaptive pruning not only has a more compact structure but also has better performance on the patch - matching task than the original network. Specifically: - **MatchNet Model**: The error rate on the UBC data set is reduced from 9.79% to 9.17%, and the number of network parameters is significantly reduced. - **Network Compression Rate**: The number of neurons in the fully - connected layers is reduced by 6.4%, 37.1% and 31.9% respectively. In conclusion, this paper proposes an effective method to optimize the learning process of local feature descriptors, which not only improves the performance of the model but also makes the network structure more compact, suitable for application in resource - constrained environments.

Local Feature Descriptor Learning with Adaptive Siamese Network

3D LiDAR-Based Global Localization Using Siamese Neural Network

Discriminatively Learning for Representing Local Image Features with Quadruplet Model

Robust Angular Local Descriptor Learning

Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions

Learning a Local Feature Descriptor for 3D LiDAR Scans

Learning local descriptors with multi-level feature aggregation and spatial context pyramid

Adaptive Deconvolution-based stereo matching Net for Local Stereo Matching

Working hard to know your neighbor's margins: Local descriptor learning loss

MTLDesc: Looking Wider to Describe Better

Category-Aware Siamese Learning Network for Few-Shot Segmentation

Learning Semantic-Aware Local Features for Long Term Visual Localization

A Simple Task-aware Contrastive Local Descriptor Selection Strategy for Few-shot Learning between inter class and intra class

Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness

Spatial-Adaptive and Feature-Enhanced Siamese Network for Change Detection.

From the Alps to Sicily: a panorama of Italian lymphomas.

Embedded Spectral Descriptors: Learning the point-wise correspondence metric via Siamese neural networks

Deep Corner

AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularization

LSNet: Extremely Light-Weight Siamese Network For Change Detection in Remote Sensing Image

Designing Lightweight Feature Descriptor Networks with Depthwise Separable Convolution