Data as voters: instance selection using approval-based multi-winner voting

Luis Sánchez-Fernández,Jesús A. Fisteus,Rafael López-Zaragoza
2024-05-21
Abstract:We present a novel approach to the instance selection problem in machine learning (or data mining). Our approach is based on recent results on (proportional) representation in approval-based multi-winner elections. In our model, instances play a double role as voters and candidates. The approval set of each instance in the training set (acting as a voter) is defined from the concept of local set, which already exists in the literature. We then select the election winners by using a representative voting rule, and such winners are the data instances kept in the reduced training set. Our experiments show that, for KNN, the rule Simple 2-EJR (a variant of the Simple EJR voting rule that satisfies 2-EJR) outperforms all the state-of-the-art algorithms and all the baselines that we consider in this paper in terms of accuracy vs reduction. For SVMs, we have obtained slight increases in the average accuracy by using several voting rules that satisfy EJR or PJR compared to the results obtained with the original datasets.
Machine Learning,Computer Science and Game Theory
What problem does this paper attempt to address?
The paper aims to address the problem of instance selection in machine learning. Specifically, the authors propose a new method based on multi-winner voting to select data instances from the training set. The core idea of this method is to use the representativeness principle in approval-based voting to select the most representative instances. The main contributions of the paper include: 1. **Proposing a new model**: In this model, each data instance acts as both a voter and a candidate. Each instance (as a voter) approves other instances based on its "local set." The local set refers to all instances of the same class that are closer to the instance than the nearest instance of a different class. 2. **Theoretical guarantees**: The authors prove that any data instance in the original training set approved by at least (K+1)/2 instances can be correctly classified by a KNN classifier if a voting rule satisfying (K+1)/2-PJR is used to vote and obtain a reduced training set. 3. **Experimental validation**: Experiments show that the Simple 2-EJR rule outperforms all existing instance selection algorithms and benchmark methods on the KNN classifier. For support vector machines (SVM), using voting rules that satisfy EJR or PJR can also slightly improve average accuracy. In summary, the paper provides a novel approach to handling the instance selection problem and demonstrates its effectiveness through theoretical and experimental validation.