Measuring the Energy Consumption and Efficiency of Deep Neural Networks: An Empirical Analysis and Design Recommendations

Charles Edison Tripp,Jordan Perr-Sauer,Jamil Gafur,Amabarish Nag,Avi Purkayastha,Sagi Zisman,Erik A. Bensen
2024-03-13
Abstract:Addressing the so-called ``Red-AI'' trend of rising energy consumption by large-scale neural networks, this study investigates the actual energy consumption, as measured by node-level watt-meters, of training various fully connected neural network architectures. We introduce the BUTTER-E dataset, an augmentation to the BUTTER Empirical Deep Learning dataset, containing energy consumption and performance data from 63,527 individual experimental runs spanning 30,582 distinct configurations: 13 datasets, 20 sizes (number of trainable parameters), 8 network ``shapes'', and 14 depths on both CPU and GPU hardware collected using node-level watt-meters. This dataset reveals the complex relationship between dataset size, network structure, and energy use, and highlights the impact of cache effects. We propose a straightforward and effective energy model that accounts for network size, computing, and memory hierarchy. Our analysis also uncovers a surprising, hardware-mediated non-linear relationship between energy efficiency and network design, challenging the assumption that reducing the number of parameters or FLOPs is the best way to achieve greater energy efficiency. Highlighting the need for cache-considerate algorithm development, we suggest a combined approach to energy efficient network, algorithm, and hardware design. This work contributes to the fields of sustainable computing and Green AI, offering practical guidance for creating more energy-efficient neural networks and promoting sustainable AI.
Artificial Intelligence,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the growing energy consumption during the training process of deep neural networks (DNN), namely the so - called "Red AI" trend. As the scale and complexity of neural networks increase, the computational resources and energy consumption required for their training also increase exponentially, which not only brings high economic costs but also leads to significant carbon emissions and other environmental impacts. To meet this challenge, this research aims to measure and understand the actual energy consumption of different fully - connected neural network architectures during the training process through empirical analysis and put forward improvement suggestions to improve energy efficiency. ### Main contributions 1. **BUTTER - E dataset**: This research introduced a new dataset named BUTTER - E, which contains data from 63,527 independent experimental runs, covering 30,582 different configurations, involving 13 datasets, 20 parameter scales, 8 network shapes and 14 depths, and the data was collected using node - level watt - meters on CPU and GPU hardware. 2. **Characterization of non - linear relationships**: The research revealed hardware - mediated energy - hyperparameter interactions, such as the non - linear relationship between the number of parameters and energy cost, and the linear relationship between the training set size and energy consumption per round. 3. **Proposal of an energy model**: Based on these interactions, the research proposed a simple and effective energy model for describing the energy consumption of fully - connected neural networks. This model takes into account the influence of network scale, computing and memory hierarchies. 4. **Discovery of counter - intuitive results**: By combining energy measurements with the BUTTER dataset, the research discovered some counter - intuitive results in the impact of hyperparameter selection on energy efficiency, for example, reducing the number of parameters or FLOPs is not always the best way to improve energy efficiency. 5. **Suggestions for future research directions**: According to the above results, the research proposed research directions for optimizing energy efficiency in architecture, algorithm and hardware design to meet the challenges brought by "Red AI". ### Key findings - **Hardware cache effect**: The research shows that the cache effect plays an important role in energy consumption. When the number of network parameters approaches the hardware cache capacity, the energy consumption will increase significantly. - **Relationship between depth and energy consumption**: Network depth has a positive impact on the energy consumption of each batch of data, and deeper networks usually require more energy. - **Non - linear relationships**: The relationships between the number of parameters and FLOPs and energy consumption are not linear, but rather complex non - linear relationships. ### Conclusion This research provides valuable insights for the fields of sustainable computing and green AI, emphasizing that the influence of hardware, algorithms and network structures should be comprehensively considered when designing high - energy - efficiency neural networks. By providing actual measurement data and practical guidance, this research helps to promote more energy - efficient deep - learning practices.