Tame a Wild Camera: In-the-Wild Monocular Camera Calibration

Shengjie Zhu,Abhinav Kumar,Masa Hu,Xiaoming Liu
2023-11-23
Abstract:3D sensing for monocular in-the-wild images, e.g., depth estimation and 3D object detection, has become increasingly important. However, the unknown intrinsic parameter hinders their development and deployment. Previous methods for the monocular camera calibration rely on specific 3D objects or strong geometry prior, such as using a checkerboard or imposing a Manhattan World assumption. This work solves the problem from the other perspective by exploiting the monocular 3D prior. Our method is assumption-free and calibrates the complete $4$ Degree-of-Freedom (DoF) intrinsic parameters. First, we demonstrate intrinsic is solved from two well-studied monocular priors, i.e., monocular depthmap, and surface normal map. However, this solution imposes a low-bias and low-variance requirement for depth estimation. Alternatively, we introduce a novel monocular 3D prior, the incidence field, defined as the incidence rays between points in 3D space and pixels in the 2D imaging plane. The incidence field is a pixel-wise parametrization of the intrinsic invariant to image cropping and resizing. With the estimated incidence field, a robust RANSAC algorithm recovers intrinsic. We demonstrate the effectiveness of our method by showing superior performance on synthetic and zero-shot testing datasets. Beyond calibration, we demonstrate downstream applications in image manipulation detection & restoration, uncalibrated two-view pose estimation, and 3D sensing. Codes, models, and data will be held in <a class="link-external link-https" href="https://github.com/ShngJZ/WildCamera" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of camera intrinsic calibration in in-the-wild monocular images. Specifically, the authors point out that although 3D perception technologies such as monocular depth estimation and 3D object detection have rapidly developed in in-the-wild images, their application is still limited by unknown camera intrinsics. Traditional monocular camera calibration methods rely on specific 3D objects or strong geometric priors (such as using a checkerboard or Manhattan world assumption), but these conditions are often not met in in-the-wild images. To solve this problem, the authors propose a new method for camera intrinsic calibration using monocular 3D priors (such as depth maps and surface normal maps). This method introduces a new concept called the "incidence field" to directly parameterize the camera intrinsics and uses a deep neural network to learn the incidence field, then recovers the complete 4 degrees of freedom (4DoF) intrinsics from the estimated incidence field through the RANSAC algorithm. ### Main Contributions: 1. **Novel Perspective**: This method addresses the monocular camera calibration problem from the perspective of monocular 3D priors without making any assumptions about the input image. 2. **Robustness**: The algorithm provides robust monocular intrinsic estimation in in-the-wild images and has been extensively benchmarked and compared with other baseline methods. 3. **Downstream Applications**: The method's application in various downstream tasks is demonstrated, including image cropping and scaling detection and recovery, uncalibrated two-view pose estimation, etc. ### Method Overview: 1. **Monocular Intrinsic Calibration**: Utilizes the consistency of monocular depth maps and surface normal maps to estimate intrinsics, but this method has numerical instability issues. 2. **Incidence Field**: Introduces the incidence field as a new monocular 3D prior, which is a pixel-level intrinsic parameterization invariant to image cropping and scaling. 3. **Network Training**: Employs a deep neural network to learn the incidence field and recovers the intrinsics from the estimated incidence field through the RANSAC algorithm. 4. **Downstream Applications**: Demonstrates the method's application in various 3D perception tasks, such as depth map to point cloud conversion, uncalibrated two-view pose estimation, etc. ### Experimental Results: - Extensive experiments were conducted on multiple public datasets, demonstrating the superior performance of the method in in-the-wild monocular camera calibration. - Compared to existing methods, this method shows higher accuracy and robustness in various scenarios. ### Conclusion: The paper proposes a novel and effective monocular camera calibration method that can handle the unknown intrinsic problem in in-the-wild images, providing important support for the promotion of 3D perception technologies in practical applications.