Abstract:Addressing the so-called ``Red-AI'' trend of rising energy consumption by large-scale neural networks, this study investigates the actual energy consumption, as measured by node-level watt-meters, of training various fully connected neural network architectures. We introduce the BUTTER-E dataset, an augmentation to the BUTTER Empirical Deep Learning dataset, containing energy consumption and performance data from 63,527 individual experimental runs spanning 30,582 distinct configurations: 13 datasets, 20 sizes (number of trainable parameters), 8 network ``shapes'', and 14 depths on both CPU and GPU hardware collected using node-level watt-meters. This dataset reveals the complex relationship between dataset size, network structure, and energy use, and highlights the impact of cache effects. We propose a straightforward and effective energy model that accounts for network size, computing, and memory hierarchy. Our analysis also uncovers a surprising, hardware-mediated non-linear relationship between energy efficiency and network design, challenging the assumption that reducing the number of parameters or FLOPs is the best way to achieve greater energy efficiency. Highlighting the need for cache-considerate algorithm development, we suggest a combined approach to energy efficient network, algorithm, and hardware design. This work contributes to the fields of sustainable computing and Green AI, offering practical guidance for creating more energy-efficient neural networks and promoting sustainable AI.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the growing energy consumption during the training process of deep neural networks (DNN), namely the so - called "Red AI" trend. As the scale and complexity of neural networks increase, the computational resources and energy consumption required for their training also increase exponentially, which not only brings high economic costs but also leads to significant carbon emissions and other environmental impacts. To meet this challenge, this research aims to measure and understand the actual energy consumption of different fully - connected neural network architectures during the training process through empirical analysis and put forward improvement suggestions to improve energy efficiency. ### Main contributions 1. **BUTTER - E dataset**: This research introduced a new dataset named BUTTER - E, which contains data from 63,527 independent experimental runs, covering 30,582 different configurations, involving 13 datasets, 20 parameter scales, 8 network shapes and 14 depths, and the data was collected using node - level watt - meters on CPU and GPU hardware. 2. **Characterization of non - linear relationships**: The research revealed hardware - mediated energy - hyperparameter interactions, such as the non - linear relationship between the number of parameters and energy cost, and the linear relationship between the training set size and energy consumption per round. 3. **Proposal of an energy model**: Based on these interactions, the research proposed a simple and effective energy model for describing the energy consumption of fully - connected neural networks. This model takes into account the influence of network scale, computing and memory hierarchies. 4. **Discovery of counter - intuitive results**: By combining energy measurements with the BUTTER dataset, the research discovered some counter - intuitive results in the impact of hyperparameter selection on energy efficiency, for example, reducing the number of parameters or FLOPs is not always the best way to improve energy efficiency. 5. **Suggestions for future research directions**: According to the above results, the research proposed research directions for optimizing energy efficiency in architecture, algorithm and hardware design to meet the challenges brought by "Red AI". ### Key findings - **Hardware cache effect**: The research shows that the cache effect plays an important role in energy consumption. When the number of network parameters approaches the hardware cache capacity, the energy consumption will increase significantly. - **Relationship between depth and energy consumption**: Network depth has a positive impact on the energy consumption of each batch of data, and deeper networks usually require more energy. - **Non - linear relationships**: The relationships between the number of parameters and FLOPs and energy consumption are not linear, but rather complex non - linear relationships. ### Conclusion This research provides valuable insights for the fields of sustainable computing and green AI, emphasizing that the influence of hardware, algorithms and network structures should be comprehensively considered when designing high - energy - efficiency neural networks. By providing actual measurement data and practical guidance, this research helps to promote more energy - efficient deep - learning practices.

Measuring the Energy Consumption and Efficiency of Deep Neural Networks: An Empirical Analysis and Design Recommendations

Survey on Energy-Efficient Deep Neural Networks for Computer Vision

Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI

Watt For What: Rethinking Deep Learning's Energy-Performance Relationship

Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices

Benchmarking Resource Usage for Efficient Distributed Deep Learning

A methodological framework for optimizing the energy consumption of deep neural networks: a case study of a cyber threat detector

On-Device Deep Learning: Survey on Techniques Improving Energy Efficiency of DNNs

Data-Centric Green AI: An Exploratory Empirical Study

Energy Efficiency of Training Neural Network Architectures: An Empirical Study

Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models

Carbon Emissions and Large Neural Network Training

The Power of Training: How Different Neural Network Setups Influence the Energy Demand

Evaluating Performance, Power and Energy of Deep Neural Networks on CPUs and GPUs.

Double-Exponential Increases in Inference Energy: The Cost of the Race for Accuracy

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node

A Transistor Operations Model for Deep Learning Energy Consumption Scaling Law

NeurstrucEnergy: A Bi-Directional GNN Model for Energy Prediction of Neural Networks in IoT

Towards energy-efficient Deep Learning: An overview of energy-efficient approaches along the Deep Learning Lifecycle

Understanding the Energy Consumption of HPC Scale Artificial Intelligence