3D-Aware Object Localization using Gaussian Implicit Occupancy Function

Vincent Gaudillière,Leo Pauly,Arunkumar Rathinam,Albert Garcia Sanchez,Mohamed Adel Musallam,Djamila Aouada
2023-08-02
Abstract:To automatically localize a target object in an image is crucial for many computer vision applications. To represent the 2D object, ellipse labels have recently been identified as a promising alternative to axis-aligned bounding boxes. This paper further considers 3D-aware ellipse labels, \textit{i.e.}, ellipses which are projections of a 3D ellipsoidal approximation of the object, for 2D target localization. Indeed, projected ellipses carry more geometric information about the object geometry and pose (3D awareness) than traditional 3D-agnostic bounding box labels. Moreover, such a generic 3D ellipsoidal model allows for approximating known to coarsely known targets. We then propose to have a new look at ellipse regression and replace the discontinuous geometric ellipse parameters with the parameters of an implicit Gaussian distribution encoding object occupancy in the image. The models are trained to regress the values of this bivariate Gaussian distribution over the image pixels using a statistical loss function. We introduce a novel non-trainable differentiable layer, E-DSNT, to extract the distribution parameters. Also, we describe how to readily generate consistent 3D-aware Gaussian occupancy parameters using only coarse dimensions of the target and relative pose labels. We extend three existing spacecraft pose estimation datasets with 3D-aware Gaussian occupancy labels to validate our hypothesis. Labels and source code are publicly accessible here: <a class="link-external link-https" href="https://cvi2.uni.lu/3d-aware-obj-loc/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to automatically locate target objects in an image and improve the precision of 2D target location and 3D perception ability**. Specifically, the author proposes a new method to regress 3D - aware elliptical labels (i.e., ellipses projected from 3D ellipsoids into 2D images) to replace traditional axis - aligned bounding boxes or simple 2D elliptical labels. ### Main Problems and Solutions 1. **Limitations of Traditional Methods** - Traditional object detection usually uses axis - aligned bounding boxes, which cannot well capture the geometric information and pose of objects. - Although the existing elliptical regression methods are more precise than bounding boxes, there is still the problem of discontinuous angle parameter regression, resulting in unstable model training. 2. **The Proposed New Method** - **3D - aware Elliptical Labels**: The author introduces 3D - aware elliptical labels. These ellipses are projected from 3D ellipsoids into 2D images and can carry more geometric and pose information. - **Implicit Gaussian Distribution**: To overcome the discontinuity of angle parameter regression, the author proposes to represent ellipses with an implicit Gaussian distribution and indirectly regress elliptical parameters by regressing the parameters (mean and covariance matrix) of the Gaussian distribution. - **A New Non - trainable Differentiable Layer (E - DSNT)**: The author designs a new non - trainable differentiable layer E - DSNT, which is used to extract the parameters of the Gaussian distribution from the regressed heatmap. ### Specific Contributions 1. **Fully Differentiable Object Location Pipeline**: A brand - new, fully differentiable object location pipeline is proposed, which can directly regress 3D - aware elliptical labels from images and achieves state - of - the - art performance. 2. **Generate 3D - aware Gaussian Occupancy Labels**: A method is provided to generate 3D - aware Gaussian occupancy labels using only the relative pose of 6 degrees of freedom and rough object dimensions. 3. **Public Datasets**: 3D - aware Gaussian occupancy labels (including heatmaps, mean and covariance labels) are generated and released for three existing spacecraft pose estimation datasets for other researchers to use. ### Experimental Verification The author has carried out experimental verification on multiple public datasets. The results show that this method outperforms existing methods in various evaluation metrics (such as IoU, Overlap, Dice, etc.), and also shows better accuracy in 3D reconstruction tasks. ### Summary By introducing 3D - aware elliptical labels and implicit Gaussian distribution, this paper solves the problems of insufficient geometric information and discontinuous angle parameters in traditional 2D target location methods, and significantly improves the precision of target location and 3D perception ability.