Abstract:In response to the increasing ubiquity of multicore processors, there has been widespread development of multithreaded applications that strive to realize their full potential. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e., scalability collapse). Existing efforts of solving scalability collapse mainly focus on making critical sections of kernel code fine-grained or designing new synchronization primitives. However, these methods have disadvantages in scalability or energy efficiency. In this article, we observe that the percentage of lock-waiting time over the total execution time for a lock intensive task has a significant correlation with the occurrence of scalability collapse. Based on this observation, a lock-contention-aware scheduler is proposed. Specifically, each task in the scheduler monitors its percentage of lock waiting time continuously. If the percentage exceeds a predefined threshold, this task is considered as lock intensive and migrated to a Special Set of Cores (i.e., SSC). In this way, the number of concurrently running lock-intensive tasks is limited to the number of cores in the SSC, and therefore, the degree of lock contention is controlled. A central challenge of using this scheme is how many cores should be allocated in the SSC to handle lock-intensive tasks. In our scheduler, the optimal number of cores is determined online by the model-driven search. The proposed scheduler is implemented in the recent Linux kernel and evaluated using micro- and macrobenchmarks on AMD and Intel 32-core systems. Experimental results suggest that our proposal is able to remove scalability collapse completely and sustains the maximal throughput of the spin-lock-based system for most applications. Furthermore, the percentage of lock-waiting time can be reduced by up to 84&percnt;. When compared with scalability collapse reduction methods such as requester-based locking scheme and sleeping-based synchronization primitives, our scheme exhibits significant advantages in scalability, power consumption, and energy efficiency.

Lock Contention Management in Multithreaded MPI

MPI+Threads: runtime contention and remedies

Computationally Improved Optimal Control Methodology for Linear Programming Problems of Flexible Manufacturing Systems.

Reducing Scalability Collapse Via Requester-Based Locking on Multicore Systems

HaLock: Hardware-assisted lock contention detection in multithreaded applications

Comparison of Lock Thrashing Avoidance Methods and Its Performance Implications for Lock Design

Multi-Level Execution Trace Based Lock Contention Analysis

Tscale: A Contention-Aware Multithreaded Framework for Multicore Multiprocessor Systems

Protecting Locks Against Unbalanced Unlock()

Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems

Lock-contention-aware Scheduler

MPI Progress For All

Plock: A Fast Lock for Architectures with Explicit Inter-core Message Passing.

Improving the Efficiency of Deadlock Detection in MPI Programs Through Trace Compression

Avoiding Scalability Collapse by Restricting Concurrency

Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach

Lock Behavior Characterization of Commercial Workloads

Requester-Based Spin Lock: A Scalable and Energy Efficient Locking Scheme on Multicore Systems

Utilizing the Multi-threading Techniques to Improve the Two-Level Checkpoint/Rollback System for MPI Applications

SeqDLM: A Sequencer-Based Distributed Lock Manager for Efficient Shared File Access in a Parallel File System

Contention-aware Lock Scheduling for Transactional Databases