Selecting Effective Features and Relations for Efficient Multi-Relational Classification.
Jun He,Hongyan Liu,Bo Hu,Xiaoyong Du,Puwei Wang
DOI: https://doi.org/10.1111/j.1467-8640.2010.00359.x
2010-01-01
Computational Intelligence
Abstract:Feature selection is an essential data processing step to remove irrelevant and redundant attributes for shorter learning time, better accuracy, and better comprehensibility. A number of algorithms have been proposed in both data mining and machine learning areas. These algorithms are usually used in a single table environment, where data are stored in one relational table or one flat file. They are not suitable for a multi‐relational environment, where data are stored in multiple tables joined to one another by semantic relationships. To address this problem, in this article, we propose a novel approach called FARS to conduct both Feature And Relation Selection for efficient multi‐relational classification. Through this approach, we not only extend the traditional feature selection method to select relevant features from multi‐relations, but also develop a new method to reconstruct the multi‐relational database schema and eliminate irrelevant tables to improve classification performance further. The results of the experiments conducted on both real and synthetic databases show that FARS can effectively choose a small set of relevant features, thereby enhancing classification efficiency and prediction accuracy significantly.