Abstract:To reduce DMA utilization for multiple algorithm IPs on FPGA, a channel configurable and multiplex DMA device (CMDMA) is proposed for asynchronous and heterogeneous algorithm IPs. Firstly, we abstract the entities and data-flow in CMDMA system with a formal description for function definition and work-flow analysis. Then based on the functions and work-flow, we design and implement a prototype of CMDMA, which includes CMDMA software driver (SW) and hardware circuits (HW) of one DMA IP, a configurable input switch (CISwitch), algorithm IPs, and an asynchronous output switch (AOSwitch). The configurable function of CMDMA is implemented by CISwitch through a configuration port in HW-level, and a configurable Round-Robin (CRR) algorithm is proposed to implement channel and input data schedule in SW-level. For output, a channel distinguishable output buffer (ChnDistBuf) is proposed, which is able to deliver channel ID and data size to SW earlier than the end time of an algorithm IP. With a double interrupt coordination method of both ChnDistBuf and algorithm IPs, CMDMA is able to successively store complete output data from different algorithm IPs. With a double interrupt coordination method of both ChnDistBuf and algorithm IPs, CMDMA is able to successively store complete output data from different algorithm IPs. The experiments based on 4 heterogeneous matrix multiplication algorithm IPs on Xilinx Zynq platform show that CMDMA is able to improve about 8%-29% average algorithm acceleration rates on single algorithm IP compared to the exclusive method that one DMA works for one algorithm IP only, and it is able to increase about 10-40MB/s and 5-15MB/s of DMA input and output data throughput with multiple algorithm IPs running in parallel. Moreover, the extended LUT and FF resources in CMDMA are 756 and 1219, both of which are about 1% of Zynq platform. Besides, in a double CNN algorithm IPs test on Mnist application, an enhanced function of data broadcasting in CMDMA is able to improve 4s than the system with 4 exclusive DMA running in parallel, meanwhile reduce 3 DMA utilization and 0.03W power consumption.

MCS-DMA: An optimization design of memory controller for DMA transfers in SoC

MCS-DMA:An Optimization Design of Memory Controller for DMA Transfers in SoC

Design Of A Dynamic Memory Access Scheduler

Design and implementation of an advanced DMA controller on AMBA-based SoC

Design of the DDR SDRAM Controller Integrated by the Data Intensive Computing Architecture

Design of Dual-Shared Dram Controller Based on Switch

A secure SoC architecture design with dual DMA controllers

Affinity-aware DMA Buffer Management for Reducing Off-Chip Memory Access

A Bus Schedule of Multimedia System-On-Chip(M-SoC)

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems

A configurable multiplex data transfer model for asynchronous and heterogeneous FPGA accelerators on single DMA device

Design of an Intelligent Dma System Architecture with Data Pre-Processing

Direct Distributed Memory Access for CMPs

Design and Implementation of an AHB-based High-Speed Memory Access Layer

Design and implementation of a flexible DMA controller in video codec system

Design and Implementation of DMA Transfers in WISHBONE Interface.

Hardware Implementation of a High Performance and Low Power DDR2 Controller

Improving System Performance in Heterogeneous MPSoC Systems via Dynamic DRAM Bandwidth Allocation

A Flexible High-Bandwidth Low-Latency Multi-Port Memory Controller

A hybrid memory architecture supporting fine-grained data migration

A High-performance, Energy-efficient Modular DMA Engine Architecture