What problem does this paper attempt to address?

The problem that this paper attempts to solve is to reduce the carbon emissions generated when large - scale machine learning (ML) inference services run in data centers. Specifically, the paper proposes a carbon - friendly ML inference service running system named Clover, aiming to balance performance, accuracy and carbon emissions through mixed - quality models and GPU resource partitioning. The main goal of the paper is to significantly reduce carbon emissions while maintaining high accuracy and meeting service - level agreement (SLA) targets. ### Background and Motivation 1. **Carbon Emission Problems**: - Computing services in data centers account for 2% of global carbon emissions, and this proportion may further increase as the amount of data grows. - Large - scale technology companies are increasingly integrating artificial intelligence (AI) into their products and hosting trained machine - learning (ML) models on GPUs in data centers to provide ML inference services. These inference services exacerbate the carbon emission problem because they occupy a large number of computing cycles in data centers. - For example, many of Google's billion - level user services rely on AI, and its inference services account for 60% of AI infrastructure emissions; Meta has expanded its infrastructure capacity by 2.5 times to meet ML inference requirements; AWS and NVIDIA estimate that inference accounts for 90% of high - performance computing (HPC) and cloud data - center ML workloads. 2. **Trade - offs between Carbon - friendliness and Performance, Accuracy**: - Carbon - friendliness is usually in conflict with performance (low inference latency) and inference accuracy (high accuracy requires complex models and more computationally intensive operations, thus increasing the carbon footprint). - Currently, there is a lack of effective tools to automatically navigate this multi - dimensional trade - off space to make ML inference services carbon - friendly. ### Key Ideas and Contributions of Clover 1. **Opportunities for Mixed - quality Models and GPU Partitioning**: - For the first time, this paper experimentally demonstrates the opportunities and trade - offs of mixed - quality models and GPU partitioning in terms of carbon emission reduction. - Experiments show that creating a mixture of different - quality models can achieve significant carbon emission reduction while maintaining high accuracy. - GPU partitioning can reduce carbon emissions by optimizing resource utilization, but it may increase latency and lead to violations of SLA targets. 2. **A Novel Carbon - aware ML Inference Framework**: - Clover designs and implements a new carbon - footprint - aware ML inference service that can reduce carbon emissions, achieve high accuracy and meet SLA targets. - This framework combines two seemingly unrelated concepts: mixed - quality models and hardware - supported GPU partitioning, to minimize the carbon footprint. - Clover's intelligent GPU partitioning improves resource utilization efficiency and provides opportunities for carbon emission savings, but it may affect the SLA. By using different - quality model variants, Clover alleviates this challenge. Low - quality models allow Clover to minimize SLA violations, while high - quality models allow Clover to achieve high accuracy. - Clover's optimization engine dynamically adapts to changes in the carbon intensity of data - center energy sources to opportunistically achieve carbon and accuracy targets while meeting SLA targets. Clover's optimization process is completely online, does not require offline training data, and is practical. ### Evaluation Results - The actual - scene evaluation of the Clover system shows that it is very effective in reducing carbon emissions during model inference while still achieving high accuracy and meeting SLA constraints. - The evaluation is carried out in a representative production environment, using real - world carbon - intensity trajectories and ML models, including the BERT model for natural - language processing, object - detection and image - classification applications. - The actual - system prototype shows that Clover's performance is close to that of the Oracle technology, which is practically infeasible. ### Summary The paper proposes a carbon - friendly ML inference service running system named Clover. Through mixed - quality models and GPU resource partitioning, it achieves the goal of reducing carbon emissions while maintaining high accuracy and meeting SLA targets. This system provides a catalyst for the community to enhance and develop carbon - aware ML inference services.

Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service

Towards Carbon-Neutral Edge Computing: Greening Edge AI by Harnessing Spot and Future Carbon Markets

Measuring the Carbon Intensity of AI in Cloud Instances

CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training

Beyond Efficiency: Scaling AI Sustainably

LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models

LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

Computing Within Limits: An Empirical Study of Energy Consumption in ML Training and Inference

Towards more sustainable enterprise data and application management with cross silo Federated Learning and Analytics

Measuring the Effectiveness of Carbon-Aware AI Training Strategies in Cloud Instances: A Confirmation Study

A House United Within Itself: SLO-Awareness for On-Premises Containerized ML Inference Clusters via Faro

Carbon Footprint Reduction for Sustainable Data Centers in Real-Time

Towards Environmentally Equitable AI via Geographical Load Balancing

Identifying architectural design decisions for achieving green ML serving

Carbon Emissions and Large Neural Network Training

An Energy and Carbon Footprint Analysis of Distributed and Federated Learning

Carbon Intensity-Aware Adaptive Inference of DNNs

Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving