NMFE-SSCC: Non-negative matrix factorization ensemble for semi-supervised collective classification

Qingyao Wu,Mingkui Tan,Xutao Li,Huaqing Min,Ning Sun
DOI: https://doi.org/10.1016/j.knosys.2015.06.026
IF: 8.139
2015-01-01
Knowledge-Based Systems
Abstract:Collective classification (CC) is a task to jointly classifying related instances of network data. Enabling CC usually improves the performance of predictive models on fully-labeled training networks with large amount of labeled data. However, acquiring such labels can be difficult and costly, and learning a CC classifier with only a few labeled data can lead to poor performance. On the other hand, there are usually large amount of unlabeled data available in practical. This naturally motivates semi-supervised collective classification (SSCC) approaches for leveraging the unlabeled data to improve CC from a sparsely-labeled network. In this paper, we propose a novel non-negative matrix factorization (NMF) based SSCC algorithm, called NMF-SSCC, to effectively learn a data representation by exploiting both labeled and unlabeled data on the network. Our idea is to use matrix factorization to obtain a compact representation of network data which uncovers the class discrimination of the data inferred from the labeled instances and simultaneously respects the intrinsic network structure. To achieve this, we design a new matrix factorization objective function and incorporate a label matrix factorization term as well as a network regularization term into it. An efficient optimization algorithm using the multiplicative updating rules is then developed to solve the new objective function. To further boost the predicting performance, we extend the proposed NMF-SSCC method into an ensemble scheme, called NMFE-SSCC, in terms of building a classification ensemble with a set of NMF-SSCC collective classifiers using different constructed latent graphs. Each NMF-SSCC classifier is learnt from one latent graph generated with various latent linkages for effectively label propagation. Experimental results on real-world data sets have demonstrated the effectiveness of the new methods.
What problem does this paper attempt to address?