How to predict on-road air pollution based on street view images and machine learning: a quantitative analysis of the optimal strategy

Hui Zhong,Di Chen,Pengqin Wang,Wenrui Wang,Shaojie Shen,Yonghong Liu,Meixin Zhu
2024-09-19
Abstract:On-road air pollution exhibits substantial variability over short distances due to emission sources, dilution, and physicochemical processes. Integrating mobile monitoring data with street view images (SVIs) holds promise for predicting local air pollution. However, algorithms, sampling strategies, and image quality introduce extra errors due to a lack of reliable references that quantify their effects. To bridge this gap, we employed 314 taxis to monitor NO, NO2, PM2.5 and PM10 dynamically and sampled corresponding SVIs, aiming to develop a reliable strategy. We extracted SVI features from ~ 382,000 streetscape images, which were collected at various angles (0°, 90°, 180°, 270°) and ranges (buffers with radii of 100m, 200m, 300m, 400m, 500m). Also, three machine learning algorithms alongside the linear land-used regression (LUR) model were experimented with to explore the influences of different algorithms. Four typical image quality issues were identified and discussed. Generally, machine learning methods outperform linear LUR for estimating the four pollutants, with the ranking: random forest > XGBoost > neural network > LUR. Compared to single-angle sampling, the averaging strategy is an effective method to avoid bias of insufficient feature capture. Therefore, the optimal sampling strategy is to obtain SVIs at a 100m radius buffer and extract features using the averaging strategy. This approach achieved estimation results for each aggregation location with absolute errors almost less than 2.5 {\mu}g/m^2 or ppb. Overexposure, blur, and underexposure led to image misjudgments and incorrect identifications, causing an overestimation of road features and underestimation of human-activity features, contributing to inaccurate NO, NO2, PM2.5 and PM10 estimation.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of predicting road air pollution based on Street View Images (SVIs) and machine learning methods. Specifically, the research team achieves this goal through the following points: 1. **Data Collection and Processing**: - Using 314 taxis to dynamically monitor four pollutants (NO, NO₂, PM₂.₅, and PM₁₀) in the central area of Guangzhou and collect corresponding street view images. - After cleaning and processing the data, 4,948,120 valid records were retained, and the data was spatially aggregated using a 200×200 meter grid. 2. **Street View Image Feature Extraction**: - Obtaining street view images for each aggregated location through the Baidu API and extracting image features using the Mask2Former algorithm. - An image quality recognition algorithm was used to detect low-quality images with blur, overexposure, or color distortion and classify them. 3. **Machine Learning Model Comparison**: - Comparing three machine learning algorithms (XGBoost, Random Forest, Neural Network) with the linear Land Use Regression (LUR) model to evaluate the performance of different algorithms in predicting the concentrations of the four pollutants. - The results showed that Random Forest performed the best in predicting all pollutants, followed by XGBoost and Neural Network, while the linear LUR model performed the worst. 4. **Sampling Strategy Optimization**: - Exploring the impact of different angles (0°, 90°, 180°, 270°) and ranges (100m to 500m buffer zones) of street view image sampling strategies on prediction errors. - The study found that using an average strategy within a 100m radius buffer zone could achieve the best prediction results, significantly reducing prediction errors compared to single-angle sampling strategies. 5. **Impact Analysis of Low-Quality Images**: - Identifying and analyzing 16.25% of low-quality images and finding that overexposure and color channel distortion were the main quality issues. - Filtering out low-quality images can significantly improve prediction accuracy, especially in the prediction of NO and PM₂.₅. Through these methods, the research team developed a reliable strategy that can accurately predict road air pollution levels over short distances, providing valuable reference and support for urban planners and policymakers.