Abstract:Modern embedded systems execute applications, which interact with the operating system and hardware differently depending on the type of workload. These cross-layer interactions result in wide variations of the chip-wide thermal profile. In this article, a reinforcement learning-based runtime manager is proposed that guarantees application-specific performance requirements and controls the POSIX thread allocation and voltage/frequency scaling for energy-efficient thermal management. This controls three thermal aspects: peak temperature, average temperature, and thermal cycling. Contrary to existing learning-based runtime approaches that optimize energy and temperature individually, the proposed runtime manager is the first approach to combine the two objectives, simultaneously addressing all three thermal aspects. However, determining thread allocation and core frequencies to optimize energy and temperature is an NP-hard problem. This leads to exponential growth in the learning table (significant memory overhead) and a corresponding increase in the exploration time to learn the most appropriate thread allocation and core frequency for a particular application workload. To confine the learning space and to minimize the learning cost, the proposed runtime manager is implemented in a two-stage hierarchy: a heuristic-based thread allocation at a longer time interval to improve thermal cycling, followed by a learning-based hardware frequency selection at a much finer interval to improve average temperature, peak temperature, and energy consumption. This enables finer control on temperature in an energy-efficient manner while simultaneously addressing scalability, which is a crucial aspect for multi-/many-core embedded systems. The proposed hierarchical runtime manager is implemented for Linux running on nVidia’s Tegra SoC, featuring four ARM Cortex-A15 cores. Experiments conducted with a range of embedded and cpu-intensive applications demonstrate that the proposed runtime manager not only reduces energy consumption by an average 15% with respect to Linux but also improves all the thermal aspects—average temperature by 14°C, peak temperature by 16°C, and thermal cycling by 54%.

Hermes: Improving Server Utilization by Colocation-Aware Runtime Systems.

Zeus: Improving Resource Efficiency Via Workload Colocation for Massive Kubernetes Clusters

Houdini's Escape

Memory at Your Service: Fast Memory Allocation for Latency-critical Services

Effectively Mitigating I/O Inactivity In Vcpu Scheduling

Provably Efficient Resource Allocation for Edge Service Entities Using Hermes.

Intelligent colocation of HPC workloads

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Practical Scheduling for Real-World Serverless Computing

Toward Low-Overhead Inter-Switch Coordination in Network-Wide Data Plane Program Deployment

Rethinking and Optimizing Workload Redistribution in Large-scale Internet Data Centers

Hermes: Enhancing Extensibility in High-Level Synthesis Through Multi-Level IRs

Workload Behavior Driven Memory Subsystem Design for Hyperscale

Adaptive and Hierarchical Runtime Manager for Energy-Aware Thermal Management of Embedded Systems

Latency Optimization for Resource Allocation in Cloud Computing System

Scheduling Parallelizable Task in Self-Organized Cloudlet Using Hermes.

Online Resource Management in Thermal and Energy Constrained Heterogeneous High Performance Computing

Understanding and Optimizing Serverless Workloads in CXL-Enabled Tiered Memory

Toward a Dynamic Allocation Strategy for Deadline‐Oriented Resource and Job Management in HPC Systems

Runtime Model Based Approach to Managing Diverse Cloud Resources