Abstract:This paper addresses the scheduling problem for multi-dimensional loops applications on heterogeneous multicore processors. In the multi-dimensional loops scheduling problem, a significant issue is how to hide memory latency to reduce the schedule length. With the increasing CPU speed, the gap between the processor and memory performance is an important bottleneck for modern high-performance computer systems. To solve the bottleneck problem, a variety of techniques have been studied to hide memory latency from intermediate fast memories (caches) to various prefetching and memory management techniques. Although there are a lot of algorithms in the literature to solve the scheduling with memory management problem for multiprocessor systems, they may not deliver good quality with high performance for heterogeneous multicore processors. In this paper, we first propose a scheduling algorithm Recom_Task_Assign to reduce the write activities to main memory. Then, in conjunction with the Recom_Task_Assign algorithm, we present a new partition scheduling algorithm called heterogeneous multiprocessor partition (HMP) based on the prefetching technique for heterogeneous multicore processors, which can hide memory latencies for applications with multi-dimensional loops. This technique takes advantage of memory access pattern information and fully considers the heterogeneity of processors to achieve high processor utilization. Our HMP algorithm selects the appropriate partition size and shape according to different processors, which increases processor utilization and reduces memory latency. Experiments on DSP benchmarks show that our algorithm can efficiently reduce memory latency and enhance parallelism compared with existing methods.

Iterational retiming with partitioning

Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

Minimizing Write Operation for Multi-Dimensional DSP Applications Via a Two-Level Partition Technique with Complete Memory Latency Hiding

Performance Optimization for Parallel Systems with Shared DWM Via Retiming, Loop Scheduling, and Data Placement

Loop Scheduling and Partitions for Hiding Memory Latencies.

Effective Loop Partitioning and Scheduling under Memory and Register Dual Constraints

Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications

Memory Partitioning and Scheduling Co-Optimization in Behavioral Synthesis

A Recursive Partition-Based In-Memory SIMD Computation Scheduler for Memory Footprint Minimization

Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching

Optimizing Data Placement of Loops for Energy Minimization with Multiple Types of Memories

Iteration Interleaving--Based SIMD Lane Partition

An Efficient Logic Operation Scheduler for Minimizing Memory Footprint of In-Memory SIMD Computation

Optimal Partitioning under Memory Constraints for Minimizing Average Schedule Length

Partitioning and Scheduling DSP Applications with Maximal Memory Access Hiding

Memory fartitioning-based modulo scheduling for high-level synthesis.

Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

Variable Partitioning and Scheduling of Multiple Memory Architectures for DSP

Efficient Memory Partitioning For Parallel Data Access Via Data Reuse

Optimizing Data Allocation for Loops on Embedded Systems with Scratch-Pad Memory

Power Aware Variable Partitioning and Instruction Scheduling for Multiple Memory Banks