Abstract:Gathering sufficient instance data to either train algorithm-selection models or understand algorithm footprints within an instance space can be challenging. We propose an approach to generating synthetic instances that are tailored to perform well with respect to a target algorithm belonging to a predefined portfolio but are also diverse with respect to their features. Our approach uses a novelty search algorithm with a linearly weighted fitness function that balances novelty and performance to generate a large set of diverse and discriminatory instances in a single run of the algorithm. We consider two definitions of novelty: (1) with respect to discriminatory performance within a portfolio of solvers; (2) with respect to the features of the evolved instances. We evaluate the proposed method with respect to its ability to generate diverse and discriminatory instances in two domains (knapsack and bin-packing), comparing to another well-known quality diversity method, Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) and an evolutionary algorithm that only evolves for discriminatory behaviour. The results demonstrate that the novelty search method outperforms its competitors in terms of coverage of the space and its ability to generate instances that are diverse regarding the relative size of the "performance gap" between the target solver and the remaining solvers in the portfolio. Moreover, for the Knapsack domain, we also show that we are able to generate novel instances in regions of an instance space not covered by existing benchmarks using a portfolio of state-of-the-art solvers. Finally, we demonstrate that the method is robust to different portfolios of solvers (stochastic approaches, deterministic heuristics and state-of-the-art methods), thereby providing further evidence of its generality.

Instance spaces for machine learning classification

Instance Importance Based SVM for Solving Imbalanced Data Classification

Instance-Ranking: A New Perspective to Consider the Instance Dependency for Classification

Curious instance selection

Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning

Instance Specific Metric Subspace Learning: A Bayesian Approach.

Instance-Based Classification Through Hypothesis Testing

Synthesising Diverse and Discriminatory Sets of Instances using Novelty Search in Combinatorial Domains

Contingency Space: A Semimetric Space for Classification Evaluation

Instance Space Analysis of Search-Based Software Testing

Multi-instance Kernel Learning with Concept Weights of Instance Space

Empirical analysis of performance assessment for imbalanced classification

Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data Classification

Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance

Exploring the Non-Trivial Knowledge Implicit in Test Instance to Fully Represent Unrestricted Bayesian Classifier.

Subspace Ensembles for Classification

PMLB: a large benchmark suite for machine learning evaluation and comparison

Iterative Metric Learning for Imbalance Data Classification

Algorithm selection and instance space analysis for curriculum-based course timetabling

Handling Class Imbalance and Overlap with a Hesitation-based Instance Selection Method

An Efficient Instance Selection Algorithm to Reconstruct Training Set for Support Vector Machine