fCOP: Focal Length Estimation from Category-level Object Priors

Xinyue Zhang,Jiaqi Yang,Xiangting Meng,Abdelrahman Mohamed,Laurent Kneip
2024-09-29
Abstract:In the realm of computer vision, the perception and reconstruction of the 3D world through vision signals heavily rely on camera intrinsic parameters, which have long been a subject of intense research within the community. In practical applications, without a strong scene geometry prior like the Manhattan World assumption or special artificial calibration patterns, monocular focal length estimation becomes a challenging task. In this paper, we propose a method for monocular focal length estimation using category-level object priors. Based on two well-studied existing tasks: monocular depth estimation and category-level object canonical representation learning, our focal solver takes depth priors and object shape priors from images containing objects and estimates the focal length from triplets of correspondences in closed form. Our experiments on simulated and real world data demonstrate that the proposed method outperforms the current state-of-the-art, offering a promising solution to the long-standing monocular focal length estimation problem.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the difficult problem of focal length estimation in monocular images. Specifically, the author proposes a method based on category - level object priors to estimate the focal length of monocular images. Traditionally, focal length estimation depends on strong scene geometry assumptions or special artificial calibration patterns, but in practical applications these assumptions are often difficult to meet, making monocular focal length estimation very challenging. The main contributions of the paper include: 1. **Proposing a new focal length estimation method**: Using category - level object priors and monocular depth estimation, a simple and efficient minimal solver is proposed. This is the first time that these two are combined for focal length estimation. 2. **Demonstrating the effectiveness and robustness of the method**: Through experimental verification on simulated data and real - world data, it is proved that the proposed method is superior to the current state - of - the - art monocular focal length estimation methods. ### Importance of focal length estimation The focal length is an important parameter in camera internal parameters. In the field of computer vision, especially in tasks such as 3D reconstruction, Structure from Motion, and visual SLAM, it plays a crucial role. Accurate focal length estimation can significantly improve the performance of these tasks. However, when there is only one image, traditional focal length estimation methods usually need to rely on strong assumptions, such as known scene geometry or specific objects, which are not applicable in many cases. ### Method overview The process of the method proposed in the paper is as follows: 1. **Input**: An RGB image containing objects of known categories. 2. **Pre - processing**: Use the existing monocular depth predictor and Normalized Object Coordinates (NOCs) predictor to obtain the depth and 3D canonical points of each visible 2D image point. 3. **Geometric relationship constraints**: According to the geometric relationship between 2D image points and 3D NOCs, establish constraints on unknown internal parameters and object poses. 4. **Focal length estimation**: Through the proposed fCOP solver, use triplets of three corresponding points to estimate the focal length in a closed - form. ### Formula summary The key formulas involved in the paper are as follows: - Perspective transformation relationship under the camera model: \[ d_iK^{- 1}\tilde{x}_i+\epsilon_{d_i}=sR(p_i+\epsilon_{p_i})+t + o_i \] where \(K\) is the unknown camera internal parameter matrix, \(s, R, t\) are the unknown scale, rotation and translation respectively, \(\epsilon_{d_i}\) and \(\epsilon_{p_i}\) represent depth noise and NOCs noise respectively, and \(o_i\) is the zero vector corresponding to inliers or any vector corresponding to outliers. - Geometric relationship after eliminating translation: \[ K^{-1}(d_i\tilde{x}_i - d_j\tilde{x}_j)=sR(p_i - p_j) \] - Norm relationship after eliminating rotation: \[ \|K^{-1}(d_i\tilde{x}_i - d_j\tilde{x}_j)\|=s\|p_i - p_j\| \] - Expression form of the linear system: \[ \begin{bmatrix} \|p_i - p_j\|^2-\|d_i x_i - d_j x_j\|^2\\ \|p_j - p_k\|^2-\|d_j x_j - d_k x_k\|^2\\ \|p_j - p_k\|^2-\|d_j x_j - d_k x_k\|^2 \end{bmatrix} \begin{bmatrix} s^2\\ 1/f^2 \end{bmatrix} = \begin{bmatrix} (d_i - d_j)^2\\ (d_j - d_k)^2\\ (d_i - d_k)^2 \end{bmatrix} \]