Abstract:Although the recent progress in the deep neural network has led to the development of learnable local feature descriptors, there is no explicit answer for estimation of the necessary size of a neural network. Specifically, the local feature is represented in a low dimensional space, so the neural network should have more compact structure. The small networks required for local feature descriptor learning may be sensitive to initial conditions and learning parameters and more likely to become trapped in local minima. In order to address the above problem, we introduce an adaptive pruning Siamese Architecture based on neuron activation to learn local feature descriptors, making the network more computationally efficient with an improved recognition rate over more complex networks. Our experiments demonstrate that our learned local feature descriptors outperform the state-of-art methods in patch matching.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively estimate the necessary scale of the neural network used to learn local feature descriptors, and overcome the problems that small - sized networks are prone to fall into local minima during the training process and are sensitive to initial conditions and learning parameters. Specifically, local features are usually represented in a low - dimensional space, so the required neural network structure should be more compact. However, small - sized networks may, due to their structural characteristics, be more affected by initial conditions and learning parameters during training and are more likely to fall into local optimal solutions.
To solve the above problems, the author introduced an adaptive pruning Siamese architecture based on neuron activation (Siamese Network), aiming to learn local feature descriptors by iteratively removing unimportant neurons, thereby making the network more computationally efficient and improving the recognition rate. Experimental results show that the proposed method not only makes the network structure more compact but also performs better than existing methods on the patch - matching task.
### Main Contributions
1. **Adaptive Pruning Siamese Architecture**: Using the neuron activation frequency as an indicator, iteratively remove unimportant neurons, and finally obtain a compact and efficient network.
2. **Improved Local Feature Descriptor Learning Strategy**: Based on the Optimal Brain Damage (OBD) theory, optimize the network structure through pruning techniques and improve the quality of feature descriptors.
3. **Experimental Verification**: Conducted a large number of experiments on the UBC patch data set, proving the effectiveness and superiority of the proposed method.
### Method Overview
- **Siamese Architecture**: Consists of two identical sub - networks and a loss module. The input is a pair of images and their labels, and the output is scalar energy.
- **Contrastive Loss Function**: Used to train the network, minimizing the distance between matching pairs and maximizing the distance between non - matching pairs.
\[
L(N)=\frac{1}{2}\left(y^{(i)}\left\|f\left(x_{1}^{(i)}\right)-f\left(x_{2}^{(i)}\right)\right\|^{2}+(1 - y^{(i)})\max\left(0,m-\left\|f\left(x_{1}^{(i)}\right)-f\left(x_{2}^{(i)}\right)\right\|\right)^{2}\right)
\]
where \(f\left(x_{1}^{(i)}\right)\) and \(f\left(x_{2}^{(i)}\right)\) are the feature descriptors of the \(i\) - th pair of images respectively, \(y^{(i)}\) is a binary label indicating whether the pair of images match, and \(m\) is a margin parameter.
- **Adaptive Pruning**: Calculate the activation frequency of neurons, and gradually remove those neurons whose activation frequency is less than 1% and their connections until the network converges.
### Experimental Results
Experimental results show that the network after adaptive pruning not only has a more compact structure but also has better performance on the patch - matching task than the original network. Specifically:
- **MatchNet Model**: The error rate on the UBC data set is reduced from 9.79% to 9.17%, and the number of network parameters is significantly reduced.
- **Network Compression Rate**: The number of neurons in the fully - connected layers is reduced by 6.4%, 37.1% and 31.9% respectively.
In conclusion, this paper proposes an effective method to optimize the learning process of local feature descriptors, which not only improves the performance of the model but also makes the network structure more compact, suitable for application in resource - constrained environments.