Abstract:Data augmentation is usually used by supervised learning approaches for offline writer identification, but such approaches require extra training data and potentially lead to overfitting errors. In this study, a semi-supervised feature learning pipeline was proposed to improve the performance of writer identification by training with extra unlabeled data and the original labeled data simultaneously. Specifically, we proposed a weighted label smoothing regularization (WLSR) method for data augmentation, which assigned the weighted uniform label distribution to the extra unlabeled data. The WLSR method could regularize the convolutional neural network (CNN) baseline to allow more discriminative features to be learned to represent the properties of different writing styles. The experimental results on well-known benchmark datasets (ICDAR2013 and CVL) showed that our proposed semi-supervised feature learning approach could significantly improve the baseline measurement and perform competitively with existing writer identification approaches. Our findings provide new insights into offline write identification.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the problems of model over - fitting and insufficient feature - learning ability caused by limited labeled data in the off - line writer identification task. Specifically: 1. **Limited labeled data**: Most of the existing methods rely on supervised learning and require a large amount of labeled data to train the model. However, in practical applications, the cost of obtaining a large amount of labeled data is very high, and in the benchmark data sets, the number of handwritten text images provided by each writer is limited. 2. **Over - fitting problem**: To increase the amount of data, some studies have used data augmentation methods, but this is prone to cause model over - fitting, especially in the case of small data sets. 3. **Insufficient feature - learning ability**: Traditional supervised learning methods have difficulty in learning highly discriminative features when dealing with a small amount of labeled data, thus affecting the recognition performance. To solve these problems, the author proposes a semi - supervised feature - learning pipeline, which combines additional unlabeled data and original labeled data for training. By introducing the Weighted Label Smoothing Regularization (WLSR) method, the model can utilize unlabeled data during the training process, reduce the risk of over - fitting, and improve the learning ability of the model, thereby improving the performance of writer identification. ### Specific methods - **Semi - supervised learning framework**: This framework uses both labeled data and unlabeled data for training, aiming to learn more effective features from more data. - **Weighted Label Smoothing Regularization (WLSR)**: For unlabeled data, WLSR assigns a weighted uniform label distribution to it to regularize the Convolutional Neural Network (CNN) so that it can learn more discriminative features. Through these methods, the author hopes to significantly improve the performance of writer identification without adding a large amount of labeled data, and has verified the effectiveness of this method on multiple benchmark data sets.

Semi-supervised Feature Learning For Improving Writer Identification

Off- Line Chinese Writer Identification Based on Character-Level Decision Combination

RWMS: Reliable Weighted Multi-Phase for Semi-supervised Segmentation

Writer Adaptation Via Deeply Learned Features for Online Chinese Handwriting Recognition

Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition

Character-level Chinese Writer Identification using Path Signature Feature, DropStroke and Deep CNN

An Improved Method Based on Weighted Grid Micro-structure Feature for Text-Independent Writer Recognition

Exploiting Multi-Scale Fusion, Spatial Attention and Patch Interaction Techniques for Text-Independent Writer Identification

Bag of Features Approach for Offline Text-Independent Chinese Writer Identification

MergeUp-augmented Semi-Weakly Supervised Learning for WSI Classification

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

Learning From Semi-Supervised Weak-Label Data

Fast writer adaptation with style extractor network for handwritten text recognition

Beyond Pixel-Level Annotation: Exploring Self-Supervised Learning for Change Detection With Image-Level Supervision

Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data

Letter-level Online Writer Identification

Writer Adaptive Feature Extraction Based on Convolutional Neural Networks for Online Handwritten Chinese Character Recognition

Semi-supervised Label Enhancement Via Structured Semantic Extraction

Boosting Semi-Supervised 3D Object Detection with Semi-Sampling