Database classification for multi-database mining

Xindong Wu,Chengqi Zhang,Shichao Zhang
DOI: https://doi.org/10.1016/j.is.2003.10.001
IF: 3.18
2005-01-01
Information Systems
Abstract:Many large organizations have multiple databases distributed in different branches, and therefore multidatabase mining is an important task for data mining. To reduce the search cost in the data from all databases, we need to identify which databases are most likely relevant to a data mining application. This is referred to as database selection. For real-world applications, database selection has to be carried out multiple times to identify relevant databases that meet different applications. In particular, a mining task may be without reference to any specific application. In this paper, we present an efficient approach for classifying multiple databases based on their similarity between each other. Our approach is application-independent.
What problem does this paper attempt to address?