Abstract:Document Layout Analysis (DLA) is critical for understanding and reconstructing documents, which aims to automatically recognize the layout structure of basic or semantic elements within a document. However, a DLA process faces certain challenges due to the diversity and complexity of document layouts with a variety of languages. In particular, it needs more theoretical and practical investigations for analyzing Chinese documents. This paper proposes a fusion framework of Whitespace Smear Cutting (WSC) and Swin Transformer for layout analysis, mainly in Chinese documents. Specifically, in the first phase, we perform a new kind of unsupervised segmentation of document images with our proposed WSC algorithm that can preserve the delicate edges of the connected blocks of a document. In the second phase, we utilize a novel semantic segmentation network based on the Swin Transformer for pixel-level classification. We design a new training paradigm of continuous training for the Swin Transformer, which consists of pre-training on the large-scale data and fine-tuning on the specific datasets to adapt the model to the special data distributions. In our fusion process, we utilize the pixel-level semantic information to direct and integrate the same semantic connected blocks obtained from the WSC algorithm and semantic segmentation with certain rules based on the confidence levels and block distributions, which effectively alleviates the challenging problem of bounding box overlap and thus improves the accuracy of semantic classification. Finally, it is demonstrated by the experimental results on a collected dataset of Chinese documents and the POD dataset that our proposed fusion framework is feasible and effective on DLA.

Cross-domain document layout analysis using document style guide

Cross-Domain Document Layout Analysis Using Document Style Guide

Knowledge-based Document Embedding for Cross-Domain Text Classification

DLAFormer: An End-to-End Transformer For Document Layout Analysis

A Graphical Approach to Document Layout Analysis

Cross-Domain Document Object Detection: Benchmark Suite and Method

Visual Similarity Based Document Layout Analysis

A Fusion Framework of Whitespace Smear Cutting and Swin Transformer for Document Layout Analysis

Image Layer Modeling for Complex Document Layout Generation.

UnSupDLA: Towards Unsupervised Document Layout Analysis

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Document Domain Randomization for Deep Learning Document Layout Extraction

Continuous document layout analysis: Human-in-the-loop AI-based data curation, database, and evaluation in the domain of public affairs

Style Adaptation for Domain-adaptive Semantic Segmentation

A Large Dataset of Historical Japanese Documents with Complex Layouts

M$^{6}$Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Domain adaptive crowd counting via dynamic scale aggregation network

Cross-Domain Labeled LDA for Cross-Domain Text Classification

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

Synthetic document generator for annotation-free layout recognition