GCF-RD

Runjin Chen,Tong Li,Yanyan Shen,Luyu Qiu,Kaidi Li,Caleb Chen Cao
DOI: https://doi.org/10.1145/3511808.3557331
2022-01-01
Abstract:Relational databases are the main storage model of structured data in most businesses, which usually involves multiple tables with key-foreign-key relationships. In practice, data analysts often want to pose predictive classification queries over relational databases. To answer such queries, many existing approaches perform supervised learning to train classification models, which heavily rely on the availability of sufficient labeled data. In this paper, we propose a novel graph-based contrastive framework for semi-supervised learning on relational databases, achieving promising predictive classification performance with only a handful of labeled data. Our framework utilizes contrastive learning to exploit additional supervision signals from massive unlabeled data. Specifically, we develop two contrastive graph views that are 1) advantageous for modeling complex relationships and correlations among structured data in a relational database, and 2) complementary to each other for learning robust representations of structured data to be classified. We also leverage label information in contrastive learning to mitigate its negative effect in knowledge transfer on the supervised counterpart. We conduct extensive experiments on three real-world relational databases and the results demonstrate that our framework is able to achieve the state-of-the-art predictive performance in limited labeled data settings, compared with various supervised and semi-supervised learning approaches.
What problem does this paper attempt to address?