Abstract:With the development of multi-core platforms and cloud computing, Non-Uniform Memory Access (NUMA) architecture has been dominant in cloud data centers in recent years. However, NUMA architecture is not well supported in virtualized environments. Because of the semantic gap introduced by the virtualization layer, hypervisors know little about the characteristics of applications running in virtual machines (VMs). More importantly, in order to guarantee hypervisors' applicability, load balance strategies of virtual CPU (VCPU) schedulers do not consider the memory access characteristics of applications running in VMs, which probably introduces significant shared resource contention and unnecessary remote memory accesses.In this paper, we propose a NUMA-aware VCPU scheduler based on Xen, named vProbe, to improve the performance of memory-intensive applications while maintaining the transparency of the virtualization layer in NUMA-based servers. It collects performance monitoring units (PMU) data for each VCPU and analyzes their memory access characteristics. Then, according to the memory access characteristics of each VCPU, it periodically reassigns all memory-intensive VCPUs to each NUMA node evenly while preferentially allocating them to their local nodes, which aims to alleviate shared resource contention and reduce unnecessary remote memory accesses. Moreover, when a physical CPU (PCPU) becomes idle, it preferentially steals a VCPU from the run queues of PCPUs in the local node to this PCPU, which helps to maintain balanced last-level cache (LLC) contention and reduce extra remote memory accesses. Our evaluation shows that vProbe can significantly improve the performance of memory-intensive applications (e.g., up to 45.2% performance improvement compared with the Credit scheduler) while introducing negligible overheads.

SymS: a symmetrical scheduler to improve multi‐threaded program performance on NUMA systems

A User-Level NUMA-Aware Scheduler for Optimizing Virtual Machine Performance.

Time-sharing Parallel Applications Through Performance-Targeted Feedback-Controlled Real-Time Scheduling.

Analysis and optimization of CFS scheduler on NUMA-based systems

Share Memory Aware Scheduler

Optimization Strategies for Inter-Thread Synchronization Overhead on NUMA Machine

Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors

A Data-Centric Tool to Improve the Performance of Multithreaded Program on NUMA.

Smart Scheduler: an Adaptive NVM-aware Thread Scheduling Approach on NUMA Systems

JArena: Partitioned Shared Memory for NUMA-awareness in Multi-threaded Scientific Applications

Performance Analysis Of Multi-Threaded Applications In Numa Multicore Processors

Mitigating Resource Contention on Multicore Systems Via Scheduling

vScope: A Fine-Grained Approach to Schedule vCPUs in NUMA Systems

PseudoNUMA for Reducing Memory Interference in Multi-Core Systems.

Vprobe: Scheduling Virtual Machines on NUMA Systems.

Enhancing the Performance and Fairness of Shared DRAM Systems with Sharing-Aware Scheduling

A Case for NUMA-aware Contention Management on Multicore Systems

A Tool to Detect Performance Problems of Multi-threaded Programs on NUMA Systems.

A Multicore Periodical Preemption Virtual Machine Scheduling Scheme to Improve the Performance of Computational Tasks

A barrier optimization framework for NUMA multi-core system

Static Micro-Scheduling: Resource Contention Relief in Multithreaded Programs