Identifying architectural design decisions for achieving green ML serving

Francisco Durán, Silverio Martínez-Fernández, Matias Martinez, Patricia Lago
DOI: https://doi.org/10.1145/3644815.3644962
2024-02-13
Abstract:The growing use of large machine learning models highlights concerns about their increasing computational demands. While the energy consumption of their training phase has received attention, fewer works have considered the inference phase. For ML inference, the binding of ML models to the ML system for user access, known as ML serving, is a critical yet understudied step for achieving efficiency in ML applications. We examine the literature in ML architectural design decisions and Green AI, with a special focus on ML serving. The aim is to analyze ML serving architectural design decisions for the purpose of understanding and identifying them with respect to quality characteristics from the point of view of researchers and practitioners in the context of ML serving literature. Our results (i) identify ML serving architectural design decisions along with their corresponding components and associated technological stack, and (ii) provide an overview of the quality characteristics studied in the literature, including energy efficiency. This preliminary study is the first step in our goal to achieve green ML serving. Our analysis may aid ML researchers and practitioners in making green-aware architecture design decisions when serving their models.
Machine Learning,Software Engineering
What problem does this paper attempt to address?
The problem discussed in this paper is how to implement green design decisions in machine learning (ML) services to reduce environmental impact. The research focuses on the inference stage after training the model, where the model is bound to the application for user access, a process referred to as ML services. Currently, while energy consumption during the training phase has received attention, there is relatively less research on efficiency during the inference stage. The paper performs a literature analysis on the architecture design decisions of ML services, with a focus on green artificial intelligence and ML services. The objective is to identify decision-makers related to quality characteristics and understand their importance in the ML services literature from the perspectives of researchers and practitioners. The research findings list major design decisions for ML services, including runtime engine, runtime engine without execution, software specific to deep learning (DL), and end-to-end ML cloud services, and discuss the quality characteristics of these decisions, such as energy efficiency. The study found that performance efficiency is the most commonly considered quality characteristic, while research on energy efficiency is relatively scarce. Furthermore, some cross-domain decisions were identified, such as containerization, model formats, request processing, and communication protocols, which may be related to service infrastructure and have an impact on service efficiency and sustainability. The paper emphasizes the need for more in-depth research on the impact of different design decisions on the energy consumption of ML services to promote the development of green ML services. This will help researchers and practitioners make more environmentally friendly architecture decisions when deploying models.