The Unseen AI Disruptions for Power Grids: LLM-Induced Transients

Yuzhuo Li,Mariam Mughees,Yize Chen,Yunwei Ryan Li
2024-09-09
Abstract:Recent breakthroughs of large language models (LLMs) have exhibited superior capability across major industries and stimulated multi-hundred-billion-dollar investment in AI-centric data centers in the next 3-5 years. This, in turn, bring the increasing concerns on sustainability and AI-related energy usage. However, there is a largely overlooked issue as challenging and critical as AI model and infrastructure efficiency: the disruptive dynamic power consumption behaviour. With fast, transient dynamics, AI infrastructure features ultra-low inertia, sharp power surge and dip, and a significant peak-idle power ratio. The power scale covers from several hundred watts to megawatts, even to gigawatts. These never-seen-before characteristics make AI a very unique load and pose threats to the power grid reliability and resilience. To reveal this hidden problem, this paper examines the scale of AI power consumption, analyzes AI transient behaviour in various scenarios, develops high-level mathematical models to depict AI workload behaviour and discusses the multifaceted challenges and opportunities they potentially bring to existing power grids. Observing the rapidly evolving machine learning (ML) and AI technologies, this work emphasizes the critical need for interdisciplinary approaches to ensure reliable and sustainable AI infrastructure development, and provides a starting point for researchers and practitioners to tackle such challenges.
Hardware Architecture,Artificial Intelligence,Performance,Systems and Control
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to reveal and analyze the unprecedented dynamic power consumption behaviors of large - scale artificial intelligence (AI) computing, especially large language models (LLMs), on the power system. Specifically, the paper focuses on the following aspects: 1. **Dynamic power consumption behaviors**: - AI infrastructure has ultra - low inertia, sharp power surges and drops, and a significant peak - idle power ratio. These characteristics make AI a very unique load and pose a threat to the reliability and resilience of the power grid. - For example, LLM training can jump from a cold start to a peak load of megawatts in a few seconds. 2. **Reliability and sustainability of the power system**: - The paper emphasizes that the rapid deployment and service of AI computing bring a surge in power demand, which may burden the existing distribution system, leading to problems such as voltage fluctuations and current fluctuations, and affecting the stable operation of the power system. - Without proper planning and management, large - scale AI models may cause peak loads during the training and deployment stages, further burdening the local distribution system. 3. **The need for interdisciplinary solutions**: - The paper points out that an interdisciplinary approach is required to ensure the reliable and sustainable development of AI infrastructure. This includes, but is not limited to, cooperation in power engineering, computer science, and environmental science. - It provides a starting point to help researchers and practitioners meet these challenges to ensure that the rapid development of AI technology does not cause uncontrollable impacts on the power system. 4. **Mathematical modeling and case studies**: - The paper develops high - level mathematical models to describe the behavior of AI workloads and explores the power consumption characteristics of AI in different operation stages (such as training, fine - tuning, and inference) through multiple case studies. - These models help to understand the unique characteristics of AI workloads, such as continuous high - power consumption, rapid fluctuations, and the relationship between computational load and power consumption. 5. **Future research directions**: - The paper summarizes future research directions from different perspectives (AI user side, data center side, and power grid side) to meet the upcoming challenges. - It emphasizes the importance of optimizing power management and infrastructure design to adapt to the rapid growth of AI computing. ### Key formulas - **Total data center power consumption model**: \[ P_{\text{total}} = P_{\text{AC Bus}} + P_{\text{External}} \] where \(P_{\text{AC Bus}}\) represents the power provided by the distribution power grid, and \(P_{\text{External}}\) represents all external power sources and energy, such as renewable energy generation, natural gas, fuel oil, etc. - **Power consumption model of supporting infrastructure**: \[ P_{\text{Supporting Infra}}=\eta_{\text{AC/AC}}\cdot (P_{\text{AHU}} + P_{\text{Chillers}}+P_{\text{CoolingTower}}+P_{\text{Pumps}}+P_{\text{Humidifiers}}+P_{\text{BMS}}+P_{\text{Lighting}}+P_{\text{Office}}+P_{\text{UPS Infra}}+P_{\text{Network Infra}}) \] - **IT power consumption model**: \[ P_{\text{IT Power}}=\eta_{\text{AC/DC}}\cdot (P_{\text{Servers}}+P_{\text{NetworkGear}}+P_{\text{Storage}}+P_{\text{CRAC}}+P_{\text{UPS IT}}) \] - **Simplified model of AI workload power consumption**: