The rising costs of training frontier AI models

Ben Cottier,Robi Rahman,Loredana Fattorini,Nestor Maslej,David Owen
2024-06-01
Abstract:The costs of training frontier AI models have grown dramatically in recent years, but there is limited public data on the magnitude and growth of these expenses. This paper develops a detailed cost model to address this gap, estimating training costs using three approaches that account for hardware, energy, cloud rental, and staff expenses. The analysis reveals that the amortized cost to train the most compute-intensive models has grown precipitously at a rate of 2.4x per year since 2016 (95% CI: 2.0x to 3.1x). For key frontier models, such as GPT-4 and Gemini, the most significant expenses are AI accelerator chips and staff costs, each costing tens of millions of dollars. Other notable costs include server components (15-22%), cluster-level interconnect (9-13%), and energy consumption (2-6%). If the trend of growing development costs continues, the largest training runs will cost more than a billion dollars by 2027, meaning that only the most well-funded organizations will be able to finance frontier AI models.
Computers and Society
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of the sharp increase in the cost of training cutting - edge AI models in recent years. Specifically, by developing a detailed cost model, the paper estimates costs in multiple aspects such as hardware, energy, cloud leasing, and personnel expenses, in order to fill the gap in the public's insufficient understanding of the scale and growth of these expenses. #### Main problems 1. **Transparency of cost estimation**: Although the cost of training large - scale AI models is very high, publicly available data is very limited. Therefore, the paper is committed to providing more transparent and detailed cost estimations. 2. **Cost growth trend**: The paper analyzes the growth trend of the amortized cost of the most computationally intensive models trained since 2016 and finds an annual growth rate of 2.4 times (95% confidence interval: 2.0 to 3.1 times). This indicates that if not controlled, future training costs will become extremely high. 3. **Main cost components**: For key cutting - edge models (such as GPT - 4 and Gemini), the largest expenditures are on AI accelerator chips and personnel costs, each reaching tens of millions of dollars. Other significant costs include server components (15 - 22%), cluster - level interconnects (9 - 13%), and energy consumption (2 - 6%). 4. **Future development trend**: If this cost growth trend continues, by 2027, the cost of the largest training runs will exceed $1 billion, which means that only the most well - funded organizations can afford the development of cutting - edge AI models. #### Solutions - **Three cost estimation methods**: 1. **Amortized hardware capital expenditure (CapEx)+ energy cost**: Consider hardware depreciation and energy consumption to provide more accurate training cost estimations. 2. **Cloud price method**: Estimate based on historical cloud platform leasing prices. Although simple, it may overestimate the actual cost. 3. **Comprehensive development cost method**: Consider not only the cost of the final training run, but also the computational costs involved in the experiment, evaluation, and fine - tuning processes, as well as the costs of R & D personnel. #### Conclusion The paper reveals the rapid growth of AI model training costs and provides a detailed analysis of cost components. This not only helps to understand the current cost structure, but also provides an important reference for the economic challenges of future AI development. As AI models continue to expand, how to find a balance between technological progress and economic feasibility will be a key problem that needs to be urgently solved.