Abstract:Instance is important in data analysis and mining; it filters out unrepresentative, redundant, or noisy data from a given training set to obtain effective model learning. Various instance selection algorithms are proposed in the literature, and their potential and applicability in data cleaning and preprocessing steps are demonstrated. For multiclass classification datasets, the existing instance selection algorithms must deal with all the instances across the different classes simultaneously to produce a reduced training set. Generally, every multiclass classification dataset can be regarded as a complex domain problem, which can be effectively solved using the divide‐and‐conquer principle. In this study, the one‐versus‐all (OVA) and one‐versus‐one (OVO) decomposition approaches were used to decompose a multiclass dataset into multiple binary class datasets. These approaches have been widely employed when constructing the classifier but have never been considered in instance selection. The results of instance selection performance obtained with the OVA, OVO, and baseline approaches were assessed and compared for 20 different domain multiclass datasets as the first study and five medical domain datasets as the validation study. Furthermore, three instance selection algorithms were compared, including IB3, DROP3, and GA. The results demonstrate that using the OVO approach to perform instance selection can make the support vector machine (SVM) and k‐nearest neighbour (k‐NN) classifiers perform significantly better than the OVA and baseline approaches in terms of the area under the ROC curve (AUC) rate, regardless of the instance selection algorithm used. Moreover, the OVO approach can provide reasonably good data reduction rates and processing times, which are all better than those of the OVA approach.

Fast Instance Selection for Speeding Up Support Vector Machines

Fast instance selection method for SVM training based on fuzzy distance metric

Clonal selection algorithm based on feature antibodies for feature selection and parameter optimization of support vector machines

Feature Selection and Parameter Optimization for Support Vector Machines: A New Approach Based on Genetic Algorithm with Feature Chromosomes.

Instance Importance Based SVM for Solving Imbalanced Data Classification

An Efficient Instance Selection Algorithm to Reconstruct Training Set for Support Vector Machine

A simple and reliable instance selection for fast training support vector machine: Valid Border Recognition

A Fast Training Method for OC-SVM Based on the Random Sampling Lemma

MMSVC: an Efficient Unsupervised Learning Approach for Large-Scale Datasets.

A Fast SVM-based Feature Selection Method

Fast extraction strategy of support vector machines

Fast SVM training using edge detection on very large datasets

A hybrid method for speeding SVM training

Instance selection using one‐versus‐all and one‐versus‐one decomposition approaches in multiclass classification datasets

Large-scale support vector machine classification with redundant data reduction

Cluster-oriented instance selection for classification problems

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Building Sparse Support Vector Machines For Multi-Instance Classification

Feature Selection Via Scaling Factor Integrated Multi-Class Support Vector Machines

Online support vector machines with vectors sieving method

Ant colony optimization edge selection for support vector machine speed optimization