Abstract:The accuracy of a classifier, whether it is an ensemble or not, is directly influenced by the training data used in learning. In remote sensing, training data mislabeling is inevitable and faces a major challenge. This article proposes a versatile data cleaning, which handles the mislabeling problem by exploiting the ensemble concepts for identifying and then eliminating or correcting the mislabeled training data. A powerful ensemble method, random forest (RF), is at the core of our filter design and helps to distinguish mislabeled data from uncorrupted data more accurately. The major contribution of this work lies on the explicit use of the hypothesis margin as a decision means to identify and eliminate or correct mislabeled training data in an ensemble learning framework. Another key development that makes our algorithm superior to existing approaches is a design that avoids rare class instances to be mistaken for class noise. This fundamental aspect makes our data cleaning system particularly suitable for remote sensing classification tasks, which usually suffer from both mislabeling and imbalance problems. The effectiveness of our algorithm is demonstrated in performing mapping of land covers. The generalization performance of two major supervised noise-sensitive classifiers, boosting and $K$ -nearest neighbors (KNNs), is strengthened by effective class noise reduction. A comparative analysis is conducted with respect to RF, deep convolutional neural networks (CNNs), and two well-established ensemble-based class noise filters, the majority vote and the consensus vote filters. This analysis demonstrates that our approach is more accurate than deep CNNs (1-D CNN, AlexNet, EfficientNet, ResNet50, and ShuffletNet) and the reference ensemble methods.

Investigation of training data issues in ensemble classification based on margin concept: application to land cover mapping

Artificial Neural Network Ensemble For Land Cover Classification

Hypothesis Margin-Based Ensemble Method for the Classification of Noisy Remote Sensing Data

Exploration of Classification Confidence in Ensemble Learning.

Assessment of Ensemble Learning for Object-Based Land Cover Mapping Using Multi-Temporal Sentinel-1/2 Images

A Novel Ensemble Support Vector Machine Model For Land Cover Classification

Imbalanced Hyperspectral Image Classification Based on Maximum Margin.

Hyperspectral Data Classification Using Margin Infused Relaxed Algorithm

A Margin-Maximizing Fine-Grained Ensemble Method

An Empirical Margin Explanation for the Effectiveness of DECORATE Ensemble Learning Algorithm

Analysis of the Margin Setting Algorithm As a Margin-Based Spherical Classification Method

Margin Distribution Analysis

Ensemble with Estimation: Seeking for Optimization in Class Noisy Data

Subspace Ensembles for Classification

Semi-supervised Rotation Forest Based on Ensemble Margin Theory for the Classification of Hyperspectral Image with Limited Training Data

Large Margin Classifier-Based Ensemble Tracking.

An Improved Ensemble Learning for Imbalanced Data Classification

Exploring Margin For Dynamic Ensemble Selection

On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods

Financial Fraud Detection: a New Ensemble Learning Approach for Imbalanced Data.

Imbalanced Data Classification Method Based on Ensemble Learning