Automation of historical weather data rescue

Y. Zhang,R. E. Sieber
DOI: https://doi.org/10.1002/gdj3.261
2024-09-28
Geoscience Data Journal
Abstract:Data rescuers worldwide have been trying to automate the retrieval of millions of handwritten weather historical records. We propose a workflow that uses artificial intelligence to automate these handwritten observations. The workflow is tested using the historical climate records from the Data Rescue: Archives and Weather project. We hope the workflow can serve as a guideline that is easily replicable and can be utilized to transcribe other historical datasets. Data rescuers worldwide have been trying to retrieve millions of valuable weather historical records so the observations contained in those records are preserved, searchable, analysable and machine readable. The majority of the records are written by hand, in print or cursive handwriting. Automatic transcriptions to date have not been reliable or sufficiently accurate on handwritten data so most of the historical records are transcribed manually. Recent attempts integrate artificial intelligence (AI) to automatically transcribe the historical records but the results have not been promising. Currently there is no end‐to‐end workflow to automatically transcribe historical handwritten tabular records into digital datasets. We propose a workflow that uses AI to automate the handwriting transcription process. The workflow is tested using the historical climate records from the Data Rescue: Archives and Weather (DRAW) project. This workflow is composed of five steps: (1) image pre‐processing, (2) text line segmentation, (3) bounding boxes detection, (4) AI‐enabled optical character recognition (OCR) and (5) layout re‐arrangement. These steps are modular to better accommodate future advances (e.g., new image training data, better layout detectors). We hope the workflow proposed can serve as a guideline that is easily replicable and can be utilized to transcribe other historical datasets.
geosciences, multidisciplinary,meteorology & atmospheric sciences
What problem does this paper attempt to address?