Abstract:Human-in-the-loop techniques are playing more and more significant roles in the machine learning pipeline, which consists of data preprocessing, data labeling, model training and inference. Humans can not only provide training data for machine learning applications, but also directly accomplish some tasks that are hard for the computer in the pipeline, with the help of machine-based approaches. In this paper, we first summarize the human-in-the-loop techniques in machine learning, including: (1) Data Extraction: Non-structured data always needs to be transformed to structured data for feature engineering, where humans can provide training data or generate rules for extraction. (2) Data Integration: In order to enrich data or features, data integration is proposed to join other tables. Humans can help to address some machine-hard join operations. (3) Data Cleaning: In real world, data is always dirty. We can leverage humans’ intelligence to clean the data and further induce rules to clean more. (4) Data Annotation and Iterative labeling. Machine learning always requires a large volume of high-quality training data, and humans can provide high quality data for training. When the budget is limited, iterative labeling is proposed to label the informative examples. (5) Model training and inference. For different applications(e.g. classification, clustering), given human labels, we have different ML techniques to train and infer the model. Then we summarize several commonly used techniques in human-in-the-loop machine learning applied in the above modules, including quality improvement, cost reduction, latency reduction, active learning and weak supervision. Finally, we provide some open challenges and opportunities.

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Human-centred Design on Crowdsourcing Annotation Towards Improving Active Learning Model Performance

Human-Machine Collaboration for Fast Land Cover Mapping

Label Assistant: A Workflow for Assisted Data Annotation in Image Segmentation Tasks

Learning to Label with Active Learning and Reinforcement Learning.

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Automatic Text Labeling Method Based on Large Language Models

Active Learning with Label Quality Control

Human-in-the-loop Techniques in Machine Learning.

Learning with Different Amounts of Annotation: From Zero to Many Labels

Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation

Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

LabelSens: Enabling Real-time Sensor Data Labelling at the point of Collection on Edge Computing

Human-LLM Collaborative Annotation Through Effective Verification of LLM Labels

Synergistic Training: Harnessing Active Learning and Pseudo-Labeling for Enhanced Model Performance in Deep Learning

Salutary Labeling with Zero Human Annotation

Learning Image Labels On-the-fly for Training Robust Classification Models

Biological data annotation via a human-augmenting AI-based labeling system

Thinking Like an Annotator: Generation of Dataset Labeling Instructions

Making Large Language Models Better Data Creators