Machine Learning Operations: A Mapping Study

Abhijit Chakraborty,Suddhasvatta Das,Kevin Gary
2024-09-29
Abstract:Machine learning and AI have been recently embraced by many companies. Machine Learning Operations, (MLOps), refers to the use of continuous software engineering processes, such as DevOps, in the deployment of machine learning models to production. Nevertheless, not all machine learning initiatives successfully transition to the production stage owing to the multitude of intricate factors involved. This article discusses the issues that exist in several components of the MLOps pipeline, namely the data manipulation pipeline, model building pipeline, and deployment pipeline. A systematic mapping study is performed to identify the challenges that arise in the MLOps system categorized by different focus areas. Using this data, realistic and applicable recommendations are offered for tools or solutions that can be used for their implementation. The main value of this work is it maps distinctive challenges in MLOps along with the recommended solutions outlined in our study. These guidelines are not specific to any particular tool and are applicable to both research and industrial settings.
Software Engineering,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the multiple complex challenges faced by Machine Learning Operations (MLOps) when successfully transitioning machine learning models from the development stage to the production environment. Specifically, the paper aims to: 1. **Identify challenges in MLOps systems**: Through a Systematic Mapping Study (SMS), the paper classifies and identifies problems existing in different components of the MLOps pipeline (data processing pipeline, model building pipeline, and deployment pipeline). 2. **Provide practical suggestions**: Based on the identified challenges, the paper provides recommendations for tools or solutions to help effectively address these challenges. These suggestions are not only applicable to specific tools but can also be widely used in research and industrial environments. 3. **Fill research gaps**: The paper also aims to discover knowledge gaps in existing literature and provide guidance for future research, especially in terms of proposing effective strategies for more effectively operationalizing MLOps pipelines. ### Specific problem description - **Data Management Pipeline (DM)**: It includes tasks such as data access, management, and cleaning. The existing challenges include a lack of diverse data samples, difficulties in data cleaning and validation, and problems with data labeling. - **Model Creation Pipeline (MC)**: It involves tasks such as feature selection, performance metric calculation, algorithm and hyper - parameter selection, model evaluation, and experiment tracking. The main challenge lies in ensuring the performance and generalization ability of the model. - **Model Deployment Pipeline (MD)**: It covers issues such as model monitoring, deployment pipeline management, operation and feedback loops, and compatibility between development and production environments. The key challenge is to ensure the stability and performance optimization of the model in a dynamic production environment. ### Research method The paper adopts the method of Systematic Mapping Study (SMS). Through extensive searching, screening, and analysis of existing literature, it identifies research trends and novelty in the MLOps field. Specific steps include: 1. **Determine research questions**: Clearly define the specific questions to be answered. 2. **Search for relevant literature extensively**: Use multiple databases for literature retrieval. 3. **Select high - quality research**: Screen out studies that meet the requirements according to preset criteria. 4. **Data analysis and integration**: Extract and integrate data from selected studies, and identify repeated themes and patterns. ### Results and discussion Through the above - mentioned methods, the paper reaches the following conclusions: - **Research trends**: Research in the MLOps field mainly focuses on the model deployment (MD) aspect, especially model monitoring, deployment pipeline management, and operation feedback loops. These areas account for approximately 52% of the research proportion. - **Novelty and innovation**: The research proposes a variety of novel tools and solutions, such as Datalake for data management, AutoML for automated model selection, and Evidently AI for model monitoring. - **Future research directions**: The paper points out the deficiencies in current research and suggests that more attention should be paid to the research of data management and model creation pipelines in the future to achieve comprehensive optimization of MLOps. In general, through systematic mapping research, this paper comprehensively analyzes the current situation and challenges in the MLOps field and provides valuable guidance for future academic and practical work.