Dynamic ensemble selection methods for heterogeneous data mining

Chris Ballard,Wenjia Wang
DOI: https://doi.org/10.1109/WCICA.2016.7578244
2016-01-01
Abstract:Big Data is often collected from multiple sources with possibly different features, representations and granularity and hence is defined as heterogeneous data. Such multiple datasets need to be fused together in some ways for further analysis. Data fusion at feature level requires domain knowledge and can be time-consuming and ineffective, but it could be avoided if decision-level fusion is applied properly. Ensemble methods appear to be an appropriate paradigm to do just that as each subset of heterogeneous data sources can be separately used to induce models independently and their decisions are then aggregated by a decision fusion function in an ensemble. This study investigates how heterogeneous data can be used to generate more diverse classifiers to build more accurate ensembles. A Dynamic Ensemble Selection Optimisation (DESO) framework is proposed, using the local feature space of heterogeneous data to increase diversity among classifiers and Simulated Annealing for optimisation. An implementation example of DESO - BaggingDES is provided with Bagging as a base platform of DESO, to test its performance and also explore the relationship between diversity and accuracy. Experiments are carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method - decision tree, and reasonably better than the classic Bagging.
What problem does this paper attempt to address?