Abstract:Human-centric perception is the core of diverse computer vision tasks and has been a long-standing research focus. However, previous research studied these human-centric tasks individually, whose performance is largely limited to the size of the public task-specific datasets. Recent human-centric methods leverage the additional modalities, e.g., depth, to learn fine-grained semantic information, which limits the benefit of pretraining models due to their sensitivity to camera views and the scarcity of RGB-D data on the Internet. This paper improves the data scalability of human-centric pretraining methods by discarding depth information and exploring semantic information of RGB images in the frequency space by Discrete Cosine Transform (DCT). We further propose new annotation denoising auxiliary tasks with keypoints and DCT maps to enforce the RGB image extractor to learn fine-grained semantic information of human bodies. Our extensive experiments show that when pretrained on large-scale datasets (COCO and AIC datasets) without depth annotation, our model achieves better performance than state-of-the-art methods by +0.5 mAP (]) on COCO, +1.4 PCKh (]) on MPII and-0.51 EPE (down arrow) on Human3.6M for pose estimation, by +4.50 mIoU (]) on Human3.6M for human parsing, by-3.14 MAE (down arrow) on SHA and-0.07 MAE (down arrow) on SHB for crowd counting, by +1.1 F1 score(]) on SHA and +0.8 F1 score(]) on SHA for crowd localization, and by +0.1 mAP (]) on Market1501 and +0.8 mAP (]) on MSMT for person ReID. We also validate the effectiveness of our method on MPII+NTURGBD datasets.

Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing

OCR accuracy improvement on document images through a novel pre-processing approach

Unknown-box Approximation to Improve Optical Character Recognition Performance

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning

Study of Tesseract OCR

Adept: Annotation-denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining

Survey of Post-OCR Processing Approaches

OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment

3D Rendering Framework for Data Augmentation in Optical Character Recognition

TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models

Advancing Post-OCR Correction: A Comparative Study of Synthetic Data

A Tool for Facilitating OCR Postediting in Historical Documents

Confidence-Aware Document OCR Error Detection

OCR Result Optimization Based on Pattern Matching.

Efficient, Lexicon-Free OCR using Deep Learning

An Evaluation of OCR Systems Against Adversarial Machine Learning

Statistical Learning for OCR Text Correction

Improved optical character recognition with deep neural network

Confusion network based Video OCR post-processing approach

Neural OCR Post-Hoc Correction of Historical Corpora