Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017: A machine learning approach

Riyang Liu,Zongwei Ma,Yang Liu,Yanchuan Shao,Wei Zhao,Jun Bi
DOI: https://doi.org/10.1016/j.envint.2020.105823
IF: 11.8
2020-09-01
Environment International
Abstract:<p>In recent years, ground-level ozone has become a severe ambient pollutant in major urban areas of China, which has adverse impacts on population health. However, in-situ measurements of the ozone concentration before 2013 in China are quite scarce, which cannot facilitate the assessment of the long-term trends and effects of ozone pollution. In this study, we used daily maximum 8-hour average (MDA8) ozone observations from 2013 to 2017 combined with concurrent ozone retrievals, aerosol reanalysis, meteorological parameters, and land-use data to establish a nationwide MDA8 prediction model based on the eXtreme Gradient Boosting (XGBoost) algorithm. The model achieves high prediction accuracy compared with other studies, with R<sup>2</sup> values for the by-year, site-based, and sample-based cross-validation (CV) schemes of 0.61, 0.64, and 0.78, respectively, at the daily level. External testing with regional measurements from 2005 to 2012 and nationwide data in 2018 have shown that the model is robust and reliable for historical data prediction, with external model testing R<sup>2</sup> values ranging from 0.60 to 0.87 at the month level in different years. Using the final estimator, we obtained nationwide monthly mean ozone concentrations from 2005 to 2012 and daily MDA8 ozone concentrations from 2013 to 2017 at a resolution of 0.1° × 0.1°. According to the average number of days exceeding the standard and the average of the 90th percentile of the MDA8 ozone concentrations, the Beijing-Tianjin-Hebei (BTH), the Yangtze River Delta, the Pearl River Delta, the Jianghan Plain, the Sichuan Basin, and the Northeast Plain regions were identified as pollution hotspots. During the research period, the overall ozone levels fluctuated slightly, and their trends were not spatially continuous. There was a significant increasing trend in the BTH region by 1.37 (95% CI: 0.46,2.29) μg/m<sup>3</sup>/year between 2013 and 2017. In 2017, 26.24% of the population lived in areas exceeding the Chinese grade II national air quality standard, which shows that ozone pollution has posed an obvious threat to population health in China. Our products will provide reliable support for future long-term nationwide health impact studies and policy-making for pollution control and prevention.</p>
environmental sciences
What problem does this paper attempt to address?
This paper attempts to address the issue of the long-term spatiotemporal distribution patterns of surface ozone concentration in China. Specifically, due to the very limited surface ozone observation data in China before 2013, it has been difficult to assess long-term trends and impacts. Therefore, this paper utilizes daily maximum 8-hour average (MDA8) ozone observation data from 2013 to 2017, combined with ozone retrievals, aerosol reanalysis, meteorological parameters, and land use data from the same period, to establish a national MDA8 prediction model based on the XGBoost algorithm. This model aims to fill the historical data gap and provide high-resolution surface ozone concentration estimates nationwide from 2005 to 2017, thereby supporting future health impact studies and pollution control policy formulation. ### Main Issues: 1. **Lack of Historical Observation Data**: Surface ozone observation data before 2013 is very limited, making it difficult to effectively assess long-term trends and impacts. 2. **Need for High-Precision Long-Term Estimates**: High-precision long-term surface ozone concentration estimates are needed for environmental health research and pollution control policy formulation. 3. **Model Stability and Reliability**: A model capable of stably predicting historical data needs to be established, and its reliability must be ensured through external validation. ### Solutions: 1. **Data Integration**: Utilize observation data from 2013 to 2017, combined with satellite retrievals, aerosol reanalysis, meteorological observations, and land use data. 2. **Machine Learning Model**: Use the XGBoost algorithm to establish a prediction model, which performs well in handling complex nonlinear relationships. 3. **Cross-Validation**: Employ by-year cross-validation (by-year CV), site-based cross-validation (site-based CV), and sample-based cross-validation (sample-based CV) strategies to ensure the model's stability and generalization ability. 4. **External Validation**: Use regional data from 2005 to 2012 and national data from 2018 for external validation to further test the model's reliability and generalization ability. ### Main Findings: 1. **Model Performance**: The model performs well on daily, monthly, and seasonal scales, with particularly small errors on monthly and seasonal scales. 2. **Spatiotemporal Distribution**: Regions such as Beijing-Tianjin-Hebei (BTH), the Yangtze River Delta, the Pearl River Delta, the Jianghan Plain, the Sichuan Basin, and the Northeast Plain are identified as pollution hotspots. 3. **Long-Term Trends**: From 2013 to 2017, ozone concentration in the BTH region increased significantly, with an annual growth rate of 1.37 μg/m³. 4. **Health Impact**: In 2017, 26.24% of the population lived in areas exceeding China's secondary air quality standard, indicating a significant threat to population health from ozone pollution. Through these methods and findings, this study provides important data support for future atmospheric pollution research and policy formulation.