Research on Estimation Model of Carbon Stock Based on Airborne LiDAR and Feature Screening

Xuan Liu,Ruirui Wang,Wei Shi,Xiaoyan Wang,Yaoyao Yang
DOI: https://doi.org/10.3390/su16104133
IF: 3.9
2024-05-16
Sustainability
Abstract:The rapid and accurate estimation of forest carbon stock is important for analyzing the carbon cycle. In order to obtain forest carbon stock efficiently, this paper utilizes airborne LiDAR data to research the applicability of different feature screening methods in combination with machine learning in the carbon stock estimation model. First, Spearman's Correlation Coefficient (SCC) and Extreme Gradient Boosting tree (XGBoost) were used to screen out the variables that were extracted via Airborne LiDAR with a higher correlation with carbon stock. Then, Bagging, K-nearest neighbor (KNN), and Random Forest (RF) were used to construct the carbon stock estimation model. The results show that the height statistical variable is more strongly correlated with carbon stocks than the density statistical variables are. RF is more suitable for the construction of the carbon stock estimation model compared to the instance-based KNN algorithm. Furthermore, the combination of the XGBoost algorithm and the RF algorithm performs best, with an R2 of 0.85 and an MSE of 10.74 on the training set and an R2 of 0.53 and an MSE of 21.81 on the testing set. This study demonstrates the effectiveness of statistical feature screening methods and Random Forest for carbon stock estimation model construction. The XGBoost algorithm has a wider applicability for feature screening.
environmental sciences,environmental studies,green & sustainable science & technology
What problem does this paper attempt to address?
The main objective of this paper is to study the application effect of airborne LiDAR data and feature selection methods in forest carbon stock estimation models. Specifically, the authors aim to: 1. **Utilize airborne LiDAR data**: Extract variables highly related to forest carbon stock, such as statistical variables like average height and maximum height. 2. **Application of feature selection methods**: Use Spearman correlation coefficient (SCC) and Extreme Gradient Boosting (XGBoost) algorithms to screen the extracted LiDAR features to identify variables highly related to carbon stock. 3. **Construct carbon stock estimation models**: Build carbon stock estimation models using three machine learning methods—Bagging, K-Nearest Neighbors (KNN), and Random Forest (RF)—and compare the performance of these models. 4. **Evaluate model performance**: Assess model performance using R² values and Mean Squared Error (MSE) on training and testing sets to determine which combination of feature selection methods and machine learning models is most suitable for carbon stock estimation. The research results of the paper indicate that height statistical variables have a stronger correlation with carbon stock than density statistical variables; among the tested models, the Random Forest (RF) model performed the best, especially when combined with XGBoost feature selection. Additionally, the study validated the effectiveness of statistical feature selection methods and the applicability of the Random Forest algorithm in constructing carbon stock estimation models.