Geometry Restoration and Dewarping of Camera-Captured Document Images

Valery Istomin,Oleg Pereziabov,Ilya Afanasyev
2025-01-07
Abstract:This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera, using algorithms for detection, segmentation, geometry restoration, and dewarping. Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid using cubic polynomial interpolation and correct nonlinear distortions by remapping the image. Using classical CV methods makes the document topology restoration process more efficient and faster, as it requires significantly fewer computational resources and memory. We developed a new pipeline for automatic document dewarping and reconstruction, along with a framework and annotated dataset to demonstrate its efficiency. Our experiments confirm the promise of our methodology and its superiority over existing benchmarks (including mobile apps and popular DL solutions, such as RectiNet, DocGeoNet, and DocTr++) both visually and in terms of document readability via Optical Character Recognition (OCR) and geometry restoration metrics. This paves the way for creating high-quality digital copies of paper documents and enhancing the efficiency of OCR systems. Project page: <a class="link-external link-https" href="https://github.com/HorizonParadox/DRCCBI" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to recover and correct the geometric distortion of paper document images captured by cameras in order to generate high - quality digital copies. Specifically, the researchers focus on developing an automated method for recovering and correcting the topological structure and non - linear distortion in paper document images captured by cameras. These problems include: 1. **Geometric Distortion**: Due to the camera shooting angle, lens distortion, and the physical deformation of the paper itself (such as bending, folding, etc.), geometric distortion occurs in the image. 2. **Illumination and Shadow Problems**: An unstable shooting environment may lead to uneven illumination and shadows in the image. 3. **Inaccurate Boundary Detection**: Traditional deep - learning methods have insufficient precision in detecting document boundaries, resulting in more severe image distortion. ### Research Objectives To address the above challenges, the author proposes a new method that combines deep - learning (DL) and classical computer vision (CV) techniques, aiming to: - Use a deep - learning model (such as YOLOv8) for document contour detection and generate a document mask. - Utilize classical computer vision methods to create a 2D topological grid based on cubic - polynomial interpolation and correct non - linear distortion by remapping the image. - Improve the accuracy of the OCR system and the readability of the document, ensuring higher - quality recovered documents. ### Method Overview 1. **Deep - Learning Stage**: - Use the YOLOv8 model to detect document boundaries and generate a mask. 2. **Classical Computer Vision Stage**: - Create a curved grid on the document surface through cubic - polynomial interpolation. - Recover the geometric shape of the original document by remapping the input image into a uniform rectangular grid. ### Main Contributions 1. **Propose a new method for document geometric recovery and dewarping**, which first uses deep - learning for document contour detection and then uses classical computer vision methods for geometric recovery. 2. **Design a new automatic dewarping and reconstruction pipeline**, and release the framework and annotated data set for the research community to verify its efficiency. 3. **Verify the effectiveness of this method through experiments**, and outperform existing benchmark methods (such as RectiNet, DocGeoNet, and DocTr++) in terms of visual effects, OCR text recognition rate, and geometric recovery metrics. Through this method, the author hopes to provide an efficient, fast, and less resource - consuming solution to generate high - quality digital document copies and improve the efficiency of the OCR system.