Abstract:Concurrent access to shared data has always been a challenge for developing multi-threaded programs and a bottleneck in the performance of Chip-Multiprocessor (CMP) systems. The challenge has been exacerbated by the need to augment processor cores and network bandwidth to fulfill the low-latency demands of ever-expanding data processing. Existing commercial best-effort Hardware Transactional Memory (HTM) is a common and effective solution. However, its architectural constraints prevent transactions from surviving in exceptions, cache overflow, and coexisting with a non-speculation fallback path, leading to unstable performance and diminishing favor. In this paper, we propose three lightweight mechanisms designed to mitigate the limitations of the best-effort HTM architecture to enhance performance stability. One is the recovery mechanism that supports the dynamic revocation of toxic conflicting requests, dramatically reducing the potential of livelocks. The second is the HTMLock mechanism with hardware and software co-design, which allows transactions using HTM and locks to run concurrently except when encountering actual conflict. Lastly, the switchingMode mechanism enables a running transaction to proactively attempt to switch to HTMLock mode in the event of a non-conflict-induced abort. Gem5 infrastructure is extended to validate and evaluate our mechanisms in a 32-core tiled CMP system. Experimental studies show that LockillerTM outperforms the coarse-grained locking scheme under STAMP benchmarks except for the yada workload, irrespective of thread number and cache size. Furthermore, our approach achieves an average of 1.86x and 1.57x speedup in all benchmarks and different threads under a typical cache size and a maximum of 7.79x and 6.73x speedup in high-contention benchmarks under extreme scenarios with only 8KB L1 cache and 32 threads, compared to best-effort HTM and state-of-the-art HTM respectively.

LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory

LosaTM: A Hardware Transactional Memory Integrated with a Low-Overhead Scenario-Awareness Conflict Manager

Achieving Forward Progress Guarantee in Small Hardware Transactions

Hardware Transactional Persistent Memory

The Influence of Malloc Placement on TSX Hardware Transactional Memory

SPMTM: A Novel ScratchPad Memory Based Hybrid Nested Transactional Memory Framework

Extending Open64 with Transactional Memory Features

Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume

Plock: A Fast Lock for Architectures with Explicit Inter-core Message Passing.

Software-Based Lightweight Multithreading to Overlap Memory-Access Latencies of Commodity Processors

Reducing Scalability Collapse Via Requester-Based Locking on Multicore Systems

TC-Release++: an Efficient Timestamp-Based Coherence Protocol for Many-Core Architectures.

Lock-Visor: An Efficient Transitory Co-scheduling for MP Guest

Crafty: Efficient, HTM-Compatible Persistent Transactions

FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory

Hardware extensions to make lazy subscription safe

Comparison of Lock Thrashing Avoidance Methods and Its Performance Implications for Lock Design

Transactional Memory Execution for Parallel Multithread Programming Without Lock

Eunomia: Scaling Concurrent Index Structures under Contention Using HTM

Exploring Hardware Transaction Processing for Reliable Computing in Chip-Multiprocessors Against Soft Errors

A Template for Implementing Fast Lock-free Trees Using HTM