Abstract:Analysis of land use/land cover (LULC) in catchment areas is the first action toward safeguarding freshwater resources. LULC information in the watershed has gained popularity in the natural science field as it helps water resource managers and environmental health specialists develop natural resource conservation strategies based on available quantitative information. Thus, remote sensing is the cornerstone in addressing environmental-related issues at the catchment level. In this study, the performance of four machine learning algorithms (MLAs), namely Random Forests (RFs), Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and Naïve Bayes (NB), were investigated to classify the catchment into nine relevant classes of the undulating watershed landscape using Landsat 8 Operational Land Imager (L8-OLI) imagery. The assessment of the MLAs was based on a visual inspection of the analyst and commonly used assessment metrics, such as user's accuracy (UA), producers' accuracy (PA), overall accuracy (OA), and the kappa coefficient. The MLAs produced good results, where RF (OA = 97.02%, Kappa = 0.96), SVM (OA = 89.74%, Kappa = 0.88), ANN (OA = 87%, Kappa = 0.86), and NB (OA = 68.64%, Kappa = 0.58). The results show the outstanding performance of the RF model over SVM and ANN with a significant margin. While NB yielded satisfactory results, its sensitivity to limited training samples could primarily influence these results. In contrast, the robust performance of RF could be due to an ability to classify high-dimensional data with limited training data.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the performance of different machine - learning algorithms in land use/land cover (LULC) classification in the uMngeni River Basin area in KwaZulu - Natal province. Specifically, the researchers selected four machine - learning algorithms: Random Forests (RF), Support Vector Machines (SVM), Artificial Neural Networks (ANN) and Naïve Bayes (NB), and compared these algorithms using Landsat 8 Operational Land Imager (L8 - OLI) image data. ### Research Background and the Importance of the Problem 1. **Water Resource Protection**: Analyzing land use/land cover (LULC) in the river basin area is the first step in protecting fresh water resources. LULC information is crucial for water resource managers and environmental health experts to develop natural resource conservation strategies based on quantitative information. 2. **Environmental Monitoring**: Remote sensing technology is a core tool for solving environment - related problems at the river - basin level. Data obtained through remote sensing can help monitor and manage the natural environment and ensure the sustainability of ecosystems. 3. **Selection of Classification Algorithms**: Although previous studies have shown that different machine - learning algorithms perform differently in LULC classification, research on the uMngeni River Basin in the specific geographical environment of South Africa is still limited. Therefore, this study aims to fill this gap and evaluate the specific performance of these algorithms in this area. ### Research Objectives 1. **Mapping LULC**: Map the land use/land cover in the uMngeni River Basin according to L8 - OLI image data. 2. **Comparing Algorithm Performance**: Compare the performance of four non - parametric pixel - level machine - learning algorithms (NB, RF, SVM, ANN) through supervised classification methods to determine the most suitable algorithm for LULC classification in this river basin. ### Method Overview 1. **Data Acquisition and Pre - processing**: Use the Google Earth Engine platform to acquire and process L8 - OLI image data. 2. **Reference Data Collection**: Collect sample data through field surveys and Google Earth Pro high - resolution images for training and validating models. 3. **Variable Correlation Analysis**: Use the Pearson correlation coefficient to analyze the relationships between variables to reduce redundancy and improve computational efficiency. 4. **Classifier Selection and Parameter Tuning**: Describe in detail the working principles of each classifier and their parameter settings. ### Expected Results Through the above methods, the study is expected to: - Provide detailed LULC configuration information in the uMngeni River Basin and provide a scientific basis for water resource management strategies. - Compare the performance of the four machine - learning algorithms and find the most suitable algorithm for LULC classification in this river basin, thus providing a reference for future similar studies. ### Formula Examples The formulas mentioned in the paper include: - **Probability formula for Naïve Bayes classification**: \[ P(y = k|X)=\frac{p(X|y = k)\cdot P(y = k)}{p(X)} \] where \(y\) is the unassigned variable, \(k\) is the class, and \(X\) is the training sample. - **Normalized Difference Vegetation Index (NDVI)**: \[ NDVI=\frac{NIR - Red}{NIR + Red} \] - **Normalized Difference Water Index (NDWI)**: \[ NDWI=\frac{Green - NIR}{Green + NIR} \] - **Bare Soil Index (BSI)**: \[ BSI=\frac{(Red + SWIR)-(NIR + Blue)}{(Red + SWIR)+(NIR + Blue)} \] These formulas are used to calculate various spectral indices to assist in LULC classification.

Evaluating Machine-Learning Algorithms for Mapping LULC of the uMngeni Catchment Area, KwaZulu-Natal

Use of machine learning-based classification algorithms in the monitoring of Land Use and Land Cover practices in a hilly terrain

Performance assessment of machine learning algorithms for mapping of land use/land cover using remote sensing data

Urban land-use classification using machine learning classifiers: comparative evaluation and post-classification multi-feature fusion approach

Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data

A comparative analysis of different pixel and object-based classification algorithms using multi-source high spatial resolution satellite data for LULC mapping

Predicting land use and land cover change dynamics in the eThekwini Municipality: a machine learning approach with Landsat imagery

Performance Evaluation of Sentinel-2 and Landsat 8 OLI Data for Land Cover/Use Classification Using a Comparison between Machine Learning Algorithms

An Application of Machine-Learning Model for Analyzing the Impact of Land-Use Change on Surface Water Resources in Gauteng Province, South Africa

Characterizing land use/land cover change dynamics by an enhanced random forest machine learning model: a Google Earth Engine implementation

Insights from Big Spatial Data through Machine Learning Techniques for Prudent Management of Natural Resources

Use of support vector machine and cellular automata methods to evaluate impact of irrigation project on LULC

Enhancing the performance of regional land cover mapping

Fine-Scale Mapping of Natural Ecological Communities Using Machine Learning Approaches

Machine Learning-Based Land Use and Land Cover Mapping Using Multi-Spectral Satellite Imagery: A Case Study in Egypt

Comprehensive evaluation of machine learning algorithms for flood susceptibility mapping in Wardha River sub-basin, India

Comparison of machine and deep learning algorithms using Google Earth Engine and Python for land classifications

Classification of land use/land cover using artificial intelligence (ANN-RF)

Application of the Random Forest Classifier to Map Irrigated Areas Using Google Earth Engine

Machine Learning Approaches for Developing Land Cover Mapping

Comparison of Google Earth Engine Machine Learning Algorithms for Mapping Smallholder Irrigated Areas in a Mountainous Watershed, Upper Blue Nile Basin, Ethiopia