Abstract:Machine Learning (ML) experiment management tools support ML practitioners and software engineers when building intelligent software systems. By managing large numbers of ML experiments comprising many different ML assets, they not only facilitate engineering ML models and ML-enabled systems, but also managing their evolution—for instance, tracing system behavior to concrete experiments when the model performance drifts. However, while ML experiment management tools have become increasingly popular, little is known about their effectiveness in practice, as well as their actual benefits and challenges. We present a mixed-methods empirical study of experiment management tools and the support they provide to users. First, our survey of 81 ML practitioners sought to determine the benefits and challenges of ML experiment management and of the existing tool landscape. Second, a controlled experiment with 15 student developers investigated the effectiveness of ML experiment management tools. We learned that 70% of our survey respondents perform ML experiments using specialized tools, while out of those who do not use such tools, 52% are unaware of experiment management tools or of their benefits. The controlled experiment showed that experiment management tools offer valuable support to users to systematically track and retrieve ML assets. Using ML experiment management tools reduced error rates and increased completion rates. By presenting a user's perspective on experiment management tools, and the first controlled experiment in this area, we hope that our results foster the adoption of these tools in practice, as well as they direct tool builders and researchers to improve the tool landscape overall.

What problem does this paper attempt to address?

This paper focuses on the effectiveness, benefits, and challenges of machine learning (ML) experiment management tools in practical applications. The research conducted empirical studies using mixed methods, including surveys of 81 ML practitioners and controlled experiments with 15 student developers. The survey revealed that approximately 70% of respondents use dedicated tools for ML experiments, while among those who do not use such tools, 52% are unaware of or do not know the benefits of experiment management tools. The controlled experiments showed that using ML experiment management tools can reduce error rates, improve completion rates, and help users systematically track and retrieve ML assets. The study found that despite the growing popularity of ML experiment management tools, there is little knowledge about their effectiveness and specific benefits in practice. These tools are designed to support the development of ML models and intelligent software systems, managing a large number of experiments and the various assets involved, including datasets, models, code, parameters, etc. However, traditional version control systems are not fully suitable for ML development as they cannot provide the appropriate level of abstraction required for exploring project history. The paper emphasizes the crucial role of experiment management tools in version tracking, traceability, auditability, reproducibility, and collaboration to support users in comparing different experiment iterations and answering factual questions about ongoing or completed experiment assets. The researchers collected data through surveys and experiments to understand the challenges faced by users, the support provided by the tools, and the actual benefits. The goal of the paper is to promote the adoption of these tools in practice, provide insights for researchers and tool developers to improve the tools, and offer recommendations for educators to train software engineers in building ML-driven systems using these tools. In this way, they hope to drive the development of ML experiment management tools, making them more effective and increasing their application in the industry.

Machine learning experiment management tools: a mixed-methods empirical study

Asset Management in Machine Learning: A Survey

An Empirical Study on the Usage of Automated Machine Learning Tools

MLDev: Data Science Experiment Automation and Reproducibility Software

An empirical study of testing machine learning in the wild

MLXP: A Framework for Conducting Replicable Experiments in Python

An Empirical Study of Challenges in Machine Learning Asset Management

Exploring MLOps Dynamics: An Experimental Analysis in a Real-World Machine Learning Project

Theory and Practice of Quality Assurance for Machine Learning Systems An Experiment Driven Approach

Use and Misuse of the Term Experiment in Mining Software Repositories Research

What are the Machine Learning best practices reported by practitioners on Stack Exchange?

Supervised machine learning for theory building and testing: Opportunities in operations management

ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems

Opportunities for Adaptive Experiments to Enable Continuous Improvement in Computer Science Education

Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning–driven data analysis

Evaluating the Energy Consumption of Machine Learning: Systematic Literature Review and Experiments

Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education

Software Engineering Practices for Machine Learning

Deep Learning for Automated Experimentation in Scanning Transmission Electron Microscopy

A Meta-Summary of Challenges in Building Products with ML Components -- Collecting Experiences from 4758+ Practitioners