Human-in-the-loop or AI-in-the-loop? Automate or Collaborate?

Sriraam Natarajan,Saurabh Mathur,Sahil Sidheekh,Wolfgang Stammer,Kristian Kersting
2024-12-19
Abstract:Human-in-the-loop (HIL) systems have emerged as a promising approach for combining the strengths of data-driven machine learning models with the contextual understanding of human experts. However, a deeper look into several of these systems reveals that calling them HIL would be a misnomer, as they are quite the opposite, namely AI-in-the-loop ($AI^2L$) systems, where the human is in control of the system, while the AI is there to support the human. We argue that existing evaluation methods often overemphasize the machine (learning) component's performance, neglecting the human expert's critical role. Consequently, we propose an $AI^2L$ perspective, which recognizes that the human expert is an active participant in the system, significantly influencing its overall performance. By adopting an $AI^2L$ approach, we can develop more comprehensive systems that faithfully model the intricate interplay between the human and machine components, leading to more effective and robust AI systems.
Human-Computer Interaction
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to distinguish and clarify the differences between "Human - in - the - loop" (HIL) systems and "AI - in - the - loop" (AI2L) systems, and emphasizes that existing evaluation methods often overemphasize the performance of machine - learning components while ignoring the crucial role of human experts. ### Specific problems include: 1. **Confusion in definition**: - Many existing systems are called HIL systems, but in fact they are more in line with the characteristics of AI2L. In HIL systems, AI is mainly responsible for decision - making, and humans provide feedback; while in AI2L systems, humans are at the control center, and AI assists humans in decision - making. 2. **Limitations of evaluation methods**: - Existing evaluation methods usually only focus on system performance indicators (such as accuracy, precision, recall, etc.), ignoring the role of human experts in the system. This evaluation method is not applicable to AI2L systems because AI2L systems focus more on the interaction between humans and AI and the achievement of overall goals. 3. **Influence on system design and deployment**: - Since there are significant differences between HIL and AI2L systems in terms of control rights, sources of bias, and evaluation criteria, it is necessary to clearly distinguish their characteristics when designing and deploying these systems. If not distinguished, it may lead to problems such as abstraction errors, model biases, and inappropriate evaluation criteria. ### Core viewpoints of the paper: - **Redefinition and classification**: The author believes that existing HIL systems should be re - examined, HIL and AI2L systems should be clearly distinguished, and appropriate design ideas should be selected according to specific application scenarios. - **New evaluation framework**: A more comprehensive evaluation framework is proposed, which not only considers system performance indicators, but also factors such as the interaction between humans and AI, system transparency, interpretability, and fairness. - **Guidance for practical applications**: Through specific examples (such as in the fields of medicine, autonomous driving, logistics, etc.), it is explained how to distinguish HIL and AI2L systems in practice and provide guidance for system designers. In short, this paper aims to promote in - depth understanding and correct application of HIL and AI2L systems in academia and industry, so as to develop more effective and reliable AI systems.