Abstract:Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area.

What problem does this paper attempt to address?

The paper aims to address the evaluation problem in the task of Handwritten Text Generation (Styled HTG). Specifically, existing evaluation methods (such as Fréchet Inception Distance, FID) have limitations when assessing the quality of handwritten style generation, as these methods mainly focus on the overall appearance of the image rather than the specific characteristics of the handwritten style. To solve this problem, the authors propose a new evaluation metric—Handwriting Distance (HWD). The main features of HWD include: 1. **Domain-Specific Feature Extraction**: Using a convolutional network pre-trained on a synthetic handwritten text image dataset to extract features, instead of using a general natural image dataset (such as ImageNet) for pre-training. 2. **Perceptual Distance**: Employing Euclidean distance to measure the perceptual difference between generated handwritten images and real handwritten images, rather than using distribution-based methods. 3. **Handling Variable-Length Images**: Capable of handling text images of different lengths, avoiding information loss caused by evaluating only a portion of the image. 4. **Numerical Stability**: Maintaining numerical stability even with a limited number of samples. Through experiments on multiple datasets, HWD demonstrates its superior performance in evaluating the task of handwritten style generation and captures the subtle differences in handwritten styles better than existing evaluation methods (such as FID). Additionally, HWD shows better stability and consistency across datasets of different scales, making it a valuable tool for evaluating handwritten text generation models.

HWD: A Novel Evaluation Score for Styled Handwritten Text Generation

VATr++: Choose Your Words Wisely for Handwritten Text Generation

Rethinking HTG Evaluation: Bridging Generation and Recognition

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Evaluating Synthetic Pre-Training for Handwriting Processing Tasks

Generating Handwriting via Decoupled Style Descriptors

Xanthomatous neuropathy of liver.

Performance Evaluation of Deep Generative Models for Generating Hand-Written Character Images

Content and Style Aware Generation of Text-line Images for Handwriting Recognition

HWNet v2: An Efficient Word Image Representation for Handwritten Documents

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

Handwritten Word Recognition using Deep Learning Approach: A Novel Way of Generating Handwritten Words

DiffusionPen: Towards Controlling the Style of Handwritten Text Generation

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

Holistic Evaluation of Text-To-Image Models

Offline Detection of Misspelled Handwritten Words by Convolving Recognition Model Features with Text Labels

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Generative Adversarial Network for Handwritten Text

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

Innovative Methods for Non-Destructive Inspection of Handwritten Documents

DeepWriting: Making Digital Ink Editable via Deep Generative Modeling