Abstract:Machine learning (ML) models have significantly impacted various domains in our everyday lives. While large language models (LLMs) offer intuitive interfaces and versatility, task-specific ML models remain valuable for their efficiency and focused performance in specialized tasks. However, developing these models requires technical expertise, making it particularly challenging for non-expert users to customize them for their unique needs. Although interactive machine learning (IML) aims to democratize ML development through user-friendly interfaces, users struggle to translate their requirements into appropriate ML tasks. We propose human-LLM collaborative ML as a new paradigm bridging human-driven IML and machine-driven LLM approaches. To realize this vision, we introduce \systemname, a framework that integrates multimodal LLMs (MLLMs) as interactive agents collaborating with users throughout the ML process. Our system carefully balances MLLM capabilities with user agency by implementing both reactive and proactive interactions between users and MLLM agents. Through a comparative user study, we demonstrate that \systemname enables non-expert users to define training data that better aligns with target tasks without increasing cognitive load, while offering opportunities for deeper engagement with ML task formulation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How can non - expert users develop machine learning (ML) models for specific tasks more easily?** Specifically, although large - language models (LLMs) offer intuitive interfaces and versatility, ML models for specific tasks still have advantages in terms of efficiency and performance. However, developing these ML models for specific tasks requires technical expertise, which is a huge challenge for non - expert users. Although interactive machine learning (IML) aims to make ML development more democratic through user - friendly interfaces, users often have difficulty translating their needs into appropriate ML tasks. To solve these problems, the authors propose a new paradigm of **human - LLM collaborative ML** and introduce the DuetML framework. DuetML helps users collaborate throughout the ML process by integrating multi - modal large - language models (MLLMs) as interactive agents. The system balances the capabilities of MLLMs and the autonomy of users by enabling reactive and proactive interactions between users and MLLM agents. ### Main problem summary: 1. **High technical threshold**: Non - expert users lack programming skills and mathematical knowledge, and it is difficult for them to customize and optimize ML models for specific tasks. 2. **Difficulty in task definition**: Users have difficulty accurately translating their needs into specific ML tasks. 3. **Insufficiency of existing methods**: Although traditional IML methods simplify the ML development process, they still have shortcomings in helping users effectively formulate tasks and create training data. ### Solutions: - **DuetML framework**: Through human - LLM collaboration, it combines the advantages of IML and LLM to help non - expert users more effectively define tasks and create training data. - **Dual - mode agents**: Including a passive agent (responding to user requests) and an active agent (providing forward - looking suggestions) to meet the interaction needs of different users. In this way, DuetML not only increases the participation of non - expert users in ML development, but also reduces their cognitive burden, enabling them to focus more on the task itself.

DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

A Model for Intelligible Interaction Between Agents That Predict and Explain

LaMI: Large Language Models for Multi-Modal Human-Robot Interaction

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence

UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks

Investigating Interaction Modes and User Agency in Human-LLM Collaboration for Domain-Specific Data Analysis

Collaborative Machine Learning Model Building with Families Using Co-ML

Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction

AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models

Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes

DreamLLM: Synergistic Multimodal Comprehension and Creation