Abstract:Applications running concurrently in CMP systems interfere with each other at DRAM memory, leading to poor system performance and fairness. Memory access scheduling reorders memory requests to improve system throughput and fairness. However, it cannot resolve the interference issue effectively. To reduce interference, memory partitioning divides memory resource among threads. Memory channel partitioning maps the data of threads that are likely to severely interfere with each other to different channels. However, it allocates memory resource unfairly and physically exacerbates memory contention of intensive threads, thus ultimately resulting in the increased slowdown of these threads and high system unfairness. Bank partitioning divides memory banks among cores and eliminates interference. However, previous equal bank partitioning restricts the number of banks available to individual thread and reduces bank level parallelism. In this paper, we first propose a Dynamic Bank Partitioning (DBP), which partitions memory banks according to threads' requirements for bank amounts. DBP compensates for the reduced bank level parallelism caused by equal bank partitioning. The key principle is to profile threads' memory characteristics at run-time and estimate their demands for bank amount, then use the estimation to direct our bank partitioning. Second, we observe that bank partitioning and memory scheduling are orthogonal in the sense; both methods can be illuminated when they are applied together. Therefore, we present a comprehensive approach which integrates Dynamic Bank Partitioning and Thread Cluster Memory scheduling (DBP-TCM, TCM is one of the best memory scheduling) to further improve system performance. Experimental results show that the proposed DBP improves system performance by 4.3% and improves system fairness by 16% over equal bank partitioning. Compared to TCM, DBP-TCM improves system throughput by 6.2% and fairness by 16.7%. When compared with MCP, DBP-TCM p- ovides 5.3% better system throughput and 37% better system fairness. We conclude that our methods are effective in improving both system throughput and fairness.

D-cache Allocation and Security for Simultaneous Multithreading

A Dynamic Resource Allocation Optimization for SMT Processors

A Spatially Triggered Dissipative Resource Distribution Policy for SMT Processors

Performance Evaluation and Optimization of Cache Architecture for Simultaneous Multithreading Processor

Dynamic Partitioning of Scalable Cache Memory for SMT Architectures

Design of non-critical path resource distributor for SMT processors

TADC: Thread-aware Divide-and-Conquer Policy to Manage Shared Cache

Dynamic Cache Reservation to Maximize Efficiency in Shared Cache Multicores

Enhancing the Performance and Fairness of Shared DRAM Systems with Sharing-Aware Scheduling

ARP :An Adaptive Runtime Mechanism to Partition Shared Cache in SMT Architecture

Research of On-chip Resource Distribution Strategies for Simultaneous Multithreaded Architecture

Vscp: A Cache Controlling Method for Improving Single Thread Performance in Multicore System

CMP Thread Assignment Based on Group Sharing L2 Cache

Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning

A Utility Based Cache Optimization Mechanism for Multi-Thread Workloads

Access Adaptive and Thread-Aware Cache Partitioning in Multicore Systems

Esdmt: Efficient and Scalable Deterministic Multithreading Through Memory Isolation

Integrated Instruction Cache Analysis and Locking in Multitasking Real-Time Systems

Dynamic Simultaneous Multithreaded Architecture

Cache Coherence Method for Improving Multi-threaded Applications on Multicore Systems

Combine thread with memory scheduling for maximizing performance in multi-core systems