The Global Historical Climatology Network Monthly Precipitation Dataset, Version 4

Scott Applequist,Imke Durre,Russell Vose
DOI: https://doi.org/10.1038/s41597-024-03457-z
2024-06-16
Scientific Data
Abstract:The Global Historical Climatology Network (GHCN) monthly precipitation dataset contains historical time series for thousands of land surface stations worldwide. Initially released in 1992 and revised in 1998, the dataset has been employed in a variety of applications over the past three decades, including operational monitoring, applied research, and international assessments. This paper describes the data and methods used to compile the latest edition (version 4), which has three major enhancements. The first enhancement is to the station network, which increased in size by a factor of five due to the inclusion of dozens of new source datasets, most notably GHCN Daily (GHCNd). The second improvement is the application of a rule-based algorithm to compare and merge records representing the same location. The third enhancement is to the quality assurance approach, now consisting of 18 new checks based on GHCNd and other operational systems. Updated monthly, the resulting dataset consists of time series of monthly precipitation totals at more than 120,000 worldwide stations, including more than 33,000 active observing sites.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper aims to solve several key problems in the Global Historical Climatology Network (GHCN) monthly precipitation data set to improve its accuracy and practicality. Specifically, the paper attempts to address the following issues: 1. **Expand the station network**: The number of stations covered by the original version of the data set was limited, resulting in insufficient spatial coverage. The new version of the data set increases the number of stations five - fold by introducing dozens of new data sources (especially GHCN Daily), thereby greatly improving the spatial resolution on a global scale. 2. **Merge duplicate records**: Since there may be multiple records for the same location in different data sources, and these records may be partially or completely overlapping. To solve this problem, the author developed a rule - based algorithm to compare and merge records representing the same location, ensuring that there is only one continuous and accurate time series for each location. 3. **Enhance quality assurance**: To improve the quality of the data, the new version of the data set introduced 18 new quality - check measures. These checks are based on the standards of GHCN Daily and other operating systems and can more effectively identify and mark suspicious data points, ensuring the reliability of the data. 4. **Update frequency and availability**: The new version of the data set is updated monthly and provides monthly precipitation data for more than 120,000 global stations, including more than 33,000 active observation stations. This enables researchers to obtain the latest precipitation data in a timely manner for various applications such as climate change research and international assessments. In summary, the main objective of this paper is to construct a more comprehensive, reliable, and user - friendly global monthly precipitation data set by improving the station network, merging duplicate records, and enhancing quality control, in order to support a wide range of climate research and applications. ### Formula Explanation Although this article mainly involves data processing and quality control, in order to ensure the accuracy and consistency of the data, some specific calculation methods and thresholds are mentioned in the article. For example, when comparing two station records, the formula for calculating the distance between them is as follows: \[ d = R \cdot \arccos(\sin \phi_1 \cdot \sin \phi_2 + \cos \phi_1 \cdot \cos \phi_2 \cdot \cos(\Delta\lambda)) \] where: - \( R \) is the radius of the earth (6,371 km) - \( \phi_1 \) and \( \phi_2 \) are the latitudes of the two stations respectively - \( \Delta\lambda \) is the difference in longitude between the two stations In addition, some specific thresholds are also mentioned in the article, such as: - If the distance between two stations is less than 10 kilometers, they are considered equivalent. - If the distance between two stations is between 10 and 100 kilometers, they are considered similar. - If the distance between two stations is greater than 100 kilometers, they are considered different. These formulas and thresholds ensure the scientific and rigorous nature of data processing and quality control.