LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion

Meenakshi Subhash Chippa,Prakash Chandra Chhipa,Kanjar De,Marcus Liwicki,Rajkumar Saini
2024-10-08
Abstract:Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the Möbius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. Möbius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at <a class="link-external link-https" href="https://github.com/meenakshi23/Log-Conformal-Maps" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the impact of **Perspective Distortion (PD)** on computer vision applications. Specifically, perspective distortion can cause significant changes in the shape, size, direction, angle, and spatial relationships of visual elements in an image, thus affecting the performance of computer vision tasks. Accurately estimating the internal and external parameters of the camera to correct perspective distortion is a challenging problem, and existing methods usually involve a multi - step process, leading to performance degradation. #### Main problems include: 1. **Impact of perspective distortion**: Perspective distortion can significantly change the geometric features in an image, making computer vision tasks such as object recognition and detection difficult. 2. **Limitations of existing methods**: Current perspective distortion correction methods are usually multi - step. They first correct the distortion and then perform task - specific learning, which is not only complex but also prone to performance degradation. 3. **High computational complexity**: Existing methods based on Möbius transformation are effective, but they need to adjust multiple interdependent parameters and involve complex arithmetic operations, resulting in high computational complexity. ### Solutions proposed in the paper To solve the above problems, the authors propose the **Log Conformal Maps (LCM)** method, which uses the logarithmic function to approximate perspective distortion and has the following advantages: - **Reduced parameters**: LCM uses fewer parameters, simplifying the parameter adjustment process. - **Low computational complexity**: LCM avoids the complex mathematical operations required by Möbius transformation, reducing the computational complexity. - **High robustness**: LCM has demonstrated robustness comparable to or even better than existing methods on multiple benchmark datasets. ### Formula representation The mathematical model of perspective distortion can be described by perspective projection: \[ \begin{pmatrix} x' \\ y' \\ w' \end{pmatrix} = \begin{pmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} X \\ Y \\ Z \\ 1 \end{pmatrix} \] where \((X, Y, Z)\) are the point coordinates in 3D space, \((x', y')\) are the projection coordinates on the 2D plane, \(f\) is the focal length, and \(w'=Z\) is the scaling factor. The final 2D coordinates \((x, y)\) can be obtained by normalization: \[ x = \frac{x'}{w'} = \frac{fX}{Z}, \quad y = \frac{y'}{w'} = \frac{fY}{Z} \] The logarithmic conformal transformation used by LCM is defined as: \[ \Psi(z) = \log(kz + c) \] where \(k\) and \(c\) are complex numbers, and \(z = x + iy\) is the complex coordinate. This transformation is nonlinear and conformal and can effectively simulate perspective distortion. ### Summary The main contribution of the paper is to propose a new method - Log Conformal Maps (LCM) - to alleviate the perspective distortion problem. By reducing the number of parameters and computational complexity, LCM has demonstrated performance comparable to or even better than existing methods on multiple benchmark datasets, especially with significant improvements in robustness and computational efficiency.