Abstract:This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera, using algorithms for detection, segmentation, geometry restoration, and dewarping. Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid using cubic polynomial interpolation and correct nonlinear distortions by remapping the image. Using classical CV methods makes the document topology restoration process more efficient and faster, as it requires significantly fewer computational resources and memory. We developed a new pipeline for automatic document dewarping and reconstruction, along with a framework and annotated dataset to demonstrate its efficiency. Our experiments confirm the promise of our methodology and its superiority over existing benchmarks (including mobile apps and popular DL solutions, such as RectiNet, DocGeoNet, and DocTr++) both visually and in terms of document readability via Optical Character Recognition (OCR) and geometry restoration metrics. This paves the way for creating high-quality digital copies of paper documents and enhancing the efficiency of OCR systems. Project page: <a class="link-external link-https" href="https://github.com/HorizonParadox/DRCCBI" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to recover and correct the geometric distortion of paper document images captured by cameras in order to generate high - quality digital copies. Specifically, the researchers focus on developing an automated method for recovering and correcting the topological structure and non - linear distortion in paper document images captured by cameras. These problems include: 1. **Geometric Distortion**: Due to the camera shooting angle, lens distortion, and the physical deformation of the paper itself (such as bending, folding, etc.), geometric distortion occurs in the image. 2. **Illumination and Shadow Problems**: An unstable shooting environment may lead to uneven illumination and shadows in the image. 3. **Inaccurate Boundary Detection**: Traditional deep - learning methods have insufficient precision in detecting document boundaries, resulting in more severe image distortion. ### Research Objectives To address the above challenges, the author proposes a new method that combines deep - learning (DL) and classical computer vision (CV) techniques, aiming to: - Use a deep - learning model (such as YOLOv8) for document contour detection and generate a document mask. - Utilize classical computer vision methods to create a 2D topological grid based on cubic - polynomial interpolation and correct non - linear distortion by remapping the image. - Improve the accuracy of the OCR system and the readability of the document, ensuring higher - quality recovered documents. ### Method Overview 1. **Deep - Learning Stage**: - Use the YOLOv8 model to detect document boundaries and generate a mask. 2. **Classical Computer Vision Stage**: - Create a curved grid on the document surface through cubic - polynomial interpolation. - Recover the geometric shape of the original document by remapping the input image into a uniform rectangular grid. ### Main Contributions 1. **Propose a new method for document geometric recovery and dewarping**, which first uses deep - learning for document contour detection and then uses classical computer vision methods for geometric recovery. 2. **Design a new automatic dewarping and reconstruction pipeline**, and release the framework and annotated data set for the research community to verify its efficiency. 3. **Verify the effectiveness of this method through experiments**, and outperform existing benchmark methods (such as RectiNet, DocGeoNet, and DocTr++) in terms of visual effects, OCR text recognition rate, and geometric recovery metrics. Through this method, the author hopes to provide an efficient, fast, and less resource - consuming solution to generate high - quality digital document copies and improve the efficiency of the OCR system.

Geometry Restoration and Dewarping of Camera-Captured Document Images

Restoring Camera-Captured Distorted Document Images

Adaptive dewarping of severely warped camera-captured document images based on document map generation

UVDoc: Neural Grid-based Document Unwarping

Image Restoration Using Dual-Domain Fusion Network for Rotating Rectangular Synthetic Aperture System

DewarpNet: Single-Image Document Unwarping with Stacked 3D and 2D Regression Networks.

Fourier Document Restoration for Robust Document Dewarping and Recognition

Learning from Documents in the Wild to Improve Document Unwarping

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

Geometric Representation Learning for Document Image Rectification

Arbitrary Warped Document Image Restoration Based on Segmentation and Thin-Plate Splines.

Restoring Warped Document Image Through Segmentation and Full Page Interpolation

Efficient Joint Rectification of Photometric and Geometric Distortions in Document Images.

Rethinking Supervision in Document Unwarping: A Self-consistent Flow-free Approach

Layout-aware Single-image Document Flattening

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

Deep Unrestricted Document Image Rectification.

DocScanner: Robust Document Image Rectification with Progressive Learning

Image Mosaicking for Oversized Documents with a Multi-Camera Rig

A Warped Document Image Mosaicing Method Based on Registration and TRS Transform

DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF