Open-Domain Semi-Supervised Learning via Glocal Cluster Structure Exploitation
Zekun Li,Lei Qi,Yawen Li,Yinghuan Shi,Yang Gao
DOI: https://doi.org/10.1109/tkde.2024.3368529
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Semi-supervised learning (SSL) aims to reduce the heavy reliance of current deep models on costly manual annotation by leveraging a large amount of unlabeled data in combination with a much smaller set of labeled data. However, most existing SSL methods assume that all labeled and unlabeled data are drawn from the same feature distribution, which can be impractical in real-world applications. In this study, we take the initial step to systematically investigate the open-domain semi-supervised learning setting, where a feature distribution mismatch exists between labeled and unlabeled data. In pursuit of an effective solution for open-domain SSL, we propose a novel framework called GlocalMatch, which aims to exploit both global and local (i.e., glocal) cluster structure of open-domain unlabeled data. The glocal cluster structure is utilized in two complementary ways. Firstly, GlocalMatch optimizes a Glocal Cluster Compacting (GCC) objective, that encourages feature representations of the same class, whether with in the same domain or across different domains, to become closer to each other. Secondly, GlocalMatch incorporates a Glocal Semantic Aggregation (GSA) strategy to produce more reliable pseudo-labels by aggregating predictions from neighboring clusters. Extensive experiments demonstrate that GlocalMatch outperforms the state-of-the-art SSL methods significantly, achieving superior performance for both in-domain and out-of-domain generalization. The code is released in https://github.com/nukezil/GlocalMatch.
computer science, information systems, artificial intelligence,engineering, electrical & electronic