High-dimensional data entity resolution based on ensemble classifying

Yi Liu,Xingchun Diao,Jianjun Cao,Yuling Shang
DOI: https://doi.org/10.3969/j.issn.1001-3695.2018.03.011
2018-01-01
Abstract:In order to effectively use rich information to improve performance of entity resolution in high-dimensional data,this paper proposed a random combinational ensemble classifiers' model.It defined the base classifier's classification performance's indicators,used the classification success rate and feature's number as two objects for optimizing base classifier,and adopted an aggregation function to transform them into a single objective optimization problem.It applied ant colony optimization to design base classifier,and adopted maximal information coefficient to measure correlation between features as heuristic information.The ensemble classifiers were composed of base classifiers which had the best diversity evaluated by Tanimoto distance,and used voting way to decide the output of ensemble classifiers.This paper adopts some benchmark datasets to evaluate the method,and the results show the effectiveness of the method.
What problem does this paper attempt to address?