Abstract:Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve ATE (Absolute Translation Error) and ARE (Absolute Rotation Error) of 0.077 m and 2.70∘ in the fr2_desk sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods.

Do We Really Need More Training Data For Object Localization

Weakly-supervised multi-class object localization using only object counts as labels

Localization, balance and affinity: a stronger multifaceted collaborative salient object detector in remote sensing images

DANet: Divergent Activation for Weakly Supervised Object Localization

IMDet: Injecting more supervision to CenterNet-like object detection

SeanNet: Semantic Understanding Network for Localization Under Object Dynamics

Evolution, maturation, and regression of lesions of lichen planus: New observations and correlations of clinical and histologic findings

CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote-Sensing Images

Deep auxiliary learning for visual localization using colorization task

Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology

Extended Feature Pyramid Network with Adaptive Scale Training Strategy and Anchors for Object Detection in Aerial Images

Few-Shot Common-Object Reasoning Using Common-Centric Localization Network

DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation

LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks

Learning from Noisy Anchors for One-Stage Object Detection.

Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes

Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

UCDnet: Double U-Shaped Segmentation Network Cascade Centroid Map Prediction for Infrared Weak Small Target Detection

Progress and limitations of deep networks to recognize objects in unusual poses

Learning Deep Object Detectors from 3D Models

Improved Deep Learning of Object Category Using Pose Information