Comparison and application of SOFM, fuzzy c-means and k-means clustering algorithms for natural soil environment regionalization in China

Wenhao Zhao,Jin Ma,Qiyuan Liu,Jing Song,Mats Tysklind,Chengshuai Liu,Dong Wang,Yajing Qu,Yihang Wu,Fengchang Wu
DOI: https://doi.org/10.1016/j.envres.2022.114519
IF: 8.3
2023-01-01
Environmental Research
Abstract:Soil attributes and their environmental drivers exhibit different patterns in different geographical directions, along with distinct regional characteristics, which may have important effects on substance migration and transformation such as organic matter and soil elements or the environmental impacts of pollutants. Therefore, regional soil characteristics should be considered in the process of regionalization for environmental management. However, no comprehensive evaluation or systematic classification of the natural soil environment has been established for China. Here, we established an index system for natural soil environmental regionalization (NSER) by combining literature data obtained based on bibliometrics with the analytic hierarchy process (AHP). Based on the index system, we collected spatial distribution data for 14 indexes at the national scale. In addition, three clustering algorithms-self-organizing feature mapping (SOFM), fuzzy c-means (FCM) and k-means (KM)-were used to classify and define the natural soil environment. We imported four cluster validity indexes (CVI) to evaluate different models: Davies-Bouldin index (DB), Silhouette index (Sil) and Calinski-Harabasz index (CH) for FCM and KM, clustering quality index (CQI) for SOFM. Analysis and comparison of the results showed that when the number of clusters was 13, the FCM clustering algorithm achieved the optimal clustering results (DB = 1.16, Sil = 0.78, CH = 6.77 × 10<sup>6</sup>), allowing the natural soil environment of China to be divided into 12 regions with distinct characteristics. Our study provides a set of comprehensive scientific research methods for regionalization research based on spatial data, it has important reference value for improving soil environmental management based on local conditions in China.
environmental sciences,public, environmental & occupational health
What problem does this paper attempt to address?