Reliable Data Distillation on Graph Convolutional Network.

Wentao Zhang,Xupeng Miao,Yingxia Shao,Jiawei Jiang,Lei Chen,Olivier Ruas,Bin Cui
DOI: https://doi.org/10.1145/3318464.3389706
2020-01-01
Abstract:Graph Convolutional Network (GCN) is a widely used method for learning from graph-based data. However, it fails to use the unlabeled data to its full potential, thereby hindering its ability. Given some pseudo labels of the unlabeled data, the GCN can benefit from this extra supervision. Based on Knowledge Distillation and Ensemble Learning, lots of methods use a teacher-student architecture to make better use of the unlabeled data and then make a better prediction. However, these methods introduce unnecessary training costs and a high bias of student model if the teacher's predictions are unreliable. Besides, the final ensemble gains are limited due to limited diversity in the combined models. Therefore, we propose Reliable Data Distillation, a reliable data driven semi-supervised GCN training method. By defining the node reliability and edge reliability in a graph, we can make better use of high quality data and improve the graph representation learning. Furthermore, considering the data reliability and data importance, we propose a new ensemble learning method for GCN and a novel Self-Boosting SSL Framework to combine the above optimizations. Finally, our extensive evaluation of Reliable Data Distillation on real-world datasets shows that our approach outperforms the state-of-the-art methods on semi-supervised node classification tasks.
What problem does this paper attempt to address?