Abstract:In many domains, effectively applying machine learning models requires a large number of annotations and labelled data, which might not be available in advance. Acquiring annotations often requires significant time, effort, and computational resources, making it challenging. Active learning strategies are pivotal in addressing these challenges, particularly for diverse data types such as graphs. Although active learning has been extensively explored for node-level classification, its application to graph-level learning, especially for regression tasks, is not well-explored. We develop a unified active learning framework specializing in graph annotating and graph-level learning for regression tasks on both standard and expanded graphs, which are more detailed representations. We begin with graph collection and construction. Then, we construct various graph embeddings (unsupervised and supervised) into a latent space. Given such an embedding, the framework becomes task agnostic and active learning can be performed using any regression method and query strategy suited for regression. Within this framework, we investigate the impact of using different levels of information for active and passive learning, e.g., partially available labels and unlabelled test data. Despite our framework being domain agnostic, we validate it on a real-world application of software performance prediction, where the execution time of the source code is predicted. Thus, the graph is constructed as an intermediate source code representation. We support our methodology with a real-world dataset to underscore the applicability of our approach. Our real-world experiments reveal that satisfactory performance can be achieved by querying labels for only a small subset of all the data. A key finding is that Graph2Vec (an unsupervised embedding approach for graph data) performs the best, but only when all train and test features are used. However, Graph Neural Networks (GNNs) are the most flexible embedding techniques when used for different levels of information with and without label access. In addition, we find that the benefit of active learning increases for larger datasets (more graphs) and when the graphs are more complex, which is arguably when active learning is the most important.

A Scalable Algorithm for Graph-Based Active Learning

Robust Offline Active Learning on Graphs

A unified active learning framework for annotating graph data for regression task

Active Learning Algorithms for Graphical Model Selection

ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks.

A Graph-Based Approach for Active Learning in Regression

Focus on informative graphs! Semi-supervised active learning for graph-level classification

S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification

Compute-Efficient Active Learning

Deep Unsupervised Active Learning on Learnable Graphs

Active Learning for Graphs with Noisy Structures

New Balanced Active Learning Model and Optimization Algorithm.

GALAXY: Graph-based Active Learning at the Extreme

Distributed Active Learning.

DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification

Transfer Active Learning For Graph Neural Networks

Active Model Selection for Graph-Based Semi-Supervised Learning

A Structural-Clustering Based Active Learning for Graph Neural Networks

Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering

Class-Balanced and Reinforced Active Learning on Graphs

Combining Topological Analysis Matrices-Based Active Learning on Networked Data Classification