Evaluating Machine-Learning Algorithms for Mapping LULC of the uMngeni Catchment Area, KwaZulu-Natal

Orlando Bhungeni,Ashadevi Ramjatan,Michael Gebreslasie
DOI: https://doi.org/10.3390/rs16122219
IF: 5
2024-06-19
Remote Sensing
Abstract:Analysis of land use/land cover (LULC) in catchment areas is the first action toward safeguarding freshwater resources. LULC information in the watershed has gained popularity in the natural science field as it helps water resource managers and environmental health specialists develop natural resource conservation strategies based on available quantitative information. Thus, remote sensing is the cornerstone in addressing environmental-related issues at the catchment level. In this study, the performance of four machine learning algorithms (MLAs), namely Random Forests (RFs), Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and Naïve Bayes (NB), were investigated to classify the catchment into nine relevant classes of the undulating watershed landscape using Landsat 8 Operational Land Imager (L8-OLI) imagery. The assessment of the MLAs was based on a visual inspection of the analyst and commonly used assessment metrics, such as user's accuracy (UA), producers' accuracy (PA), overall accuracy (OA), and the kappa coefficient. The MLAs produced good results, where RF (OA = 97.02%, Kappa = 0.96), SVM (OA = 89.74%, Kappa = 0.88), ANN (OA = 87%, Kappa = 0.86), and NB (OA = 68.64%, Kappa = 0.58). The results show the outstanding performance of the RF model over SVM and ANN with a significant margin. While NB yielded satisfactory results, its sensitivity to limited training samples could primarily influence these results. In contrast, the robust performance of RF could be due to an ability to classify high-dimensional data with limited training data.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the performance of different machine - learning algorithms in land use/land cover (LULC) classification in the uMngeni River Basin area in KwaZulu - Natal province. Specifically, the researchers selected four machine - learning algorithms: Random Forests (RF), Support Vector Machines (SVM), Artificial Neural Networks (ANN) and Naïve Bayes (NB), and compared these algorithms using Landsat 8 Operational Land Imager (L8 - OLI) image data. ### Research Background and the Importance of the Problem 1. **Water Resource Protection**: Analyzing land use/land cover (LULC) in the river basin area is the first step in protecting fresh water resources. LULC information is crucial for water resource managers and environmental health experts to develop natural resource conservation strategies based on quantitative information. 2. **Environmental Monitoring**: Remote sensing technology is a core tool for solving environment - related problems at the river - basin level. Data obtained through remote sensing can help monitor and manage the natural environment and ensure the sustainability of ecosystems. 3. **Selection of Classification Algorithms**: Although previous studies have shown that different machine - learning algorithms perform differently in LULC classification, research on the uMngeni River Basin in the specific geographical environment of South Africa is still limited. Therefore, this study aims to fill this gap and evaluate the specific performance of these algorithms in this area. ### Research Objectives 1. **Mapping LULC**: Map the land use/land cover in the uMngeni River Basin according to L8 - OLI image data. 2. **Comparing Algorithm Performance**: Compare the performance of four non - parametric pixel - level machine - learning algorithms (NB, RF, SVM, ANN) through supervised classification methods to determine the most suitable algorithm for LULC classification in this river basin. ### Method Overview 1. **Data Acquisition and Pre - processing**: Use the Google Earth Engine platform to acquire and process L8 - OLI image data. 2. **Reference Data Collection**: Collect sample data through field surveys and Google Earth Pro high - resolution images for training and validating models. 3. **Variable Correlation Analysis**: Use the Pearson correlation coefficient to analyze the relationships between variables to reduce redundancy and improve computational efficiency. 4. **Classifier Selection and Parameter Tuning**: Describe in detail the working principles of each classifier and their parameter settings. ### Expected Results Through the above methods, the study is expected to: - Provide detailed LULC configuration information in the uMngeni River Basin and provide a scientific basis for water resource management strategies. - Compare the performance of the four machine - learning algorithms and find the most suitable algorithm for LULC classification in this river basin, thus providing a reference for future similar studies. ### Formula Examples The formulas mentioned in the paper include: - **Probability formula for Naïve Bayes classification**: \[ P(y = k|X)=\frac{p(X|y = k)\cdot P(y = k)}{p(X)} \] where \(y\) is the unassigned variable, \(k\) is the class, and \(X\) is the training sample. - **Normalized Difference Vegetation Index (NDVI)**: \[ NDVI=\frac{NIR - Red}{NIR + Red} \] - **Normalized Difference Water Index (NDWI)**: \[ NDWI=\frac{Green - NIR}{Green + NIR} \] - **Bare Soil Index (BSI)**: \[ BSI=\frac{(Red + SWIR)-(NIR + Blue)}{(Red + SWIR)+(NIR + Blue)} \] These formulas are used to calculate various spectral indices to assist in LULC classification.