Abstract:Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.

Document Image Classification Without Optical Character Recognition

CNN Based Page Object Detection in Document Images

Braille-to-Chinese Translation System Based on Optical Braille Recognition

Document Image Orientation Based on Both Text and Image

Design And Development Of An Ancient Chinese Document Recognition System

Document Image Retrieval Based on Multi-Density Features

Chinese Document Categorization without Dictionary Support and Segmentation Processing

Postprocessing Algorithm for the Optical Recognition of Degraded Characters

Chinese Documents Classification Based on N-Grams

A CHINESE DOCUMENT CATEGORIZATION SYSTEM WITHOUT DICTIONARY SUPPORT AND SEGMENTATION PROCESSING

Chinese Documents Categorization Based on N-gram Information

Efficient Document Image Classification Using Region-Based Graph Neural Network

Towards Mobile Document Image Retrieval for Digital Library

Document Classification Based on Word Vectors

LOCR: Location-Guided Transformer for Optical Character Recognition

General Chinese Document Capture System with Improved Error-Rejecting Module

A VECTOR SPACE MODEL BASED DOCUMENT CLASSIFICATION SYSTEM [J]

Advanced Topics in Character Recongition and Document Analysis: Research Works in Intelligent Image and Document Research Lab, Tsinghua University

OCR Result Optimization Based on Pattern Matching.

Advanced Digital Image Processing Technique based Optical Character Recognition of Scanned Document

Fast Keyword Spotting in Handwritten Chinese Documents Using Index