Model Matching: A Novel Framework to Use Clustering Strategy to Solve the Classification Problem.

Zhiyi Duan,Limin Wang,Minghui Sun
DOI: https://doi.org/10.1109/access.2019.2922199
IF: 3.9
2019-01-01
IEEE Access
Abstract:It is a common practice to handle labeled data with classifiers and unlabeled ones with clusterings. The traditional Bayesian network classifiers (BNC$^{\mathcal {T}}\text{s}$ ) learned from labeled training set $\mathcal {T}$ directly map the unlabeled test instance into the network structure to calculate the conditional probability for the classification, which neglects the information hidden in the unlabeled data and will result in classification bias. To address this issue, we propose a novel learning framework, called model matching, that uses the “clustering” strategy to solve the classification problem. The labeled data is divided into several clusters according to the different class label to learn a set of BNC$^{\mathcal {T}}\text{s}$ and a corresponding set of BNC$^{p}\text{s}$ is built for each unlabeled test instance. To make a classification, the cross entropy method is applied to compare the structural similarity between BNC$^{\mathcal {T}}$ and BNCp. The extensive experimental results on 46 datasets from the University of California at Irvine (UCI) machine learning repository demonstrate that for BNCs model matching helps improve the generalization performance and outperforms the several state-of-the-art classifiers like tree-augmented naive Bayes and Random forest.
What problem does this paper attempt to address?