Abstract:This study demonstrates that Large Language Models (LLMs) can transcribe historical handwritten documents with significantly higher accuracy than specialized Handwritten Text Recognition (HTR) software, while being faster and more cost-effective. We introduce an open-source software tool called Transcription Pearl that leverages these capabilities to automatically transcribe and correct batches of handwritten documents using commercially available multimodal LLMs from OpenAI, Anthropic, and Google. In tests on a diverse corpus of 18th/19th century English language handwritten documents, LLMs achieved Character Error Rates (CER) of 5.7 to 7% and Word Error Rates (WER) of 8.9 to 15.9%, improvements of 14% and 32% respectively over specialized state-of-the-art HTR software like Transkribus. Most significantly, when LLMs were then used to correct those transcriptions as well as texts generated by conventional HTR software, they achieved near-human levels of accuracy, that is CERs as low as 1.8% and WERs of 3.5%. The LLMs also completed these tasks 50 times faster and at approximately 1/50th the cost of proprietary HTR programs. These results demonstrate that when LLMs are incorporated into software tools like Transcription Pearl, they provide an accessible, fast, and highly accurate method for mass transcription of historical handwritten documents, significantly streamlining the digitization process.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use large - language models (LLMs) to achieve efficient, low - cost and high - precision transcription of handwritten historical documents, thereby significantly simplifying the digitization process of historical manuscripts. Specifically, the paper explores the following aspects: 1. **Improving transcription accuracy**: By using LLMs, the paper shows that these models can transcribe handwritten English documents from the 18th and 19th centuries more accurately than specialized handwritten text recognition (HTR) software. LLMs perform well in terms of character error rate (CER) and word error rate (WER), reaching 5.7% - 7% and 8.9% - 15.9% respectively, which are 14% and 32% higher respectively compared to existing HTR software such as Transkribus. 2. **Cost and efficiency**: LLMs are not only 50 times faster in transcription speed, but also cost only about 1/50 of the existing HTR software. This makes large - scale transcription projects more economically viable. 3. **No pre - processing or fine - tuning required**: Traditional HTR methods usually require complex image pre - processing steps and a large amount of labeled data for fine - tuning, while LLMs can directly handle diverse handwritten documents "out of the box", greatly reducing the usage threshold. 4. **Automatic correction function**: LLMs can not only generate initial transcripts, but also be used to correct the transcription results generated by other LLMs or HTR software, further improving the transcription quality to reach near - human - level accuracy (CER as low as 1.8% and WER of 3.5%). In conclusion, this paper aims to show the potential of LLMs in the transcription of historical handwritten documents, providing a fast, economical and high - precision solution, which greatly promotes the digitization process of historical literature.

Unlocking the Archives: Using Large Language Models to Transcribe Handwritten Historical Documents

Handwriting Recognition in Historical Documents with Multimodal LLM

CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models

The Sound of Healthcare: Improving Medical Transcription ASR Accuracy with Large Language Models

Advancing radiology practice and research: harnessing the potential of large language models amidst imperfections

If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History

Scrambled text: training Language Models to correct OCR errors using synthetic data

TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text

Large language models for extracting histopathologic diagnoses from electronic health records

Transcribing Medieval Manuscripts for Machine Learning

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

Handwritten Text Recognition for Documentary Medieval Manuscripts

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Reference-Based Post-OCR Processing with LLM for Diacritic Languages

A tailored Handwritten-Text-Recognition System for Medieval Latin

Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

PHD: Pixel-Based Language Modeling of Historical Documents

Large language models can effectively extract stroke and reperfusion audit data from medical free-text discharge summaries

Apprentices to Research Assistants: Advancing Research with Large Language Models

Information Extraction from Historical Well Records Using A Large Language Model