Construction of Membership Functions for Soil Mapping Using the Partial Dependence of Soil on Environmental Covariates Calculated by Random Forest

Canying Zeng,Lin Yang,A-Xing Zhu
DOI: https://doi.org/10.2136/sssaj2016.06.0195
IF: 2.9
2017-01-01
Soil Science Society of America Journal
Abstract:Core Ideas This study develops a method to construct membership functions representing knowledge on soil–environment relationships from partial dependence. Use of representative samples as training samples is recommended when applying the proposed method. Training samples (including representative samples and other samples) with good coverage in the environmental feature space would allow Random Forest to obtain more accurate soil maps than using representative samples. Partial dependence plots generated by Random Forest imply an association between soil and environmental variables. Partial dependence plots generated by Random Forest (RF) imply an association between soil and environmental variables. This study develops a method to construct membership functions representing knowledge of soil–environment relationships from partial dependence. Key parameters were obtained from normalized partial dependence to define class limits and membership gradation. Seven environmental variables were selected on the basis of the variable's importance within RF. Two cases were conducted to test the effectiveness of our method using different training samples. Case 1 used 33 representative locations as training samples and 50 locations as validations. Case 2 randomly split all 83 samples into training and validation subsets at a proportion of 2:1; the splits were repeated seven times. For each case, the generated membership functions were used for mapping soil subgroups in Heshan, China, under the Soil Landscape Inference Model framework; RF was conducted for comparison. The results showed that mapping accuracy based on the membership functions (78%) was much higher than that of RF only (60%) in Case 1. In Case 2, the mapping accuracies using membership functions (an average of 67%, SD = 6.5%) were not always higher than those by RF (an average of 67%, SD = 8.0%). The constructed membership functions were impacted by the training samples. Use of representative training samples is recommended when applying the proposed method. However, training samples (including representative samples and other samples) with good coverage in the environmental feature space would allow RF to obtain more accurate soil maps than using representative samples.
What problem does this paper attempt to address?