Abstract:<p>The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters.</p>

Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning

An Efficient Algorithm for Mapping Deep Learning Applications on the NoC Architecture

AOME: Autonomous Optimal Mapping Exploration Using Reinforcement Learning for NoC-based Accelerators Running Neural Networks

CCASM: A Computation- and Communication-Aware Scheduling and Mapping Algorithm for NoC-Based DNN Accelerators

Optimized Mapping Spiking Neural Networks onto Network-on-Chip.

Efficient Mapping Space Exploration for a Reconfigurable Neural Accelerator

AINNS: All-Inclusive Neural Network Scheduling via Accelerator Formalization

Mrna: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

Core Placement Optimization for Multi-chip Many-core Neural Network Systems with Reinforcement Learning.

Self-optimizing Two-layer Network-on-Chip Based on Dominant Network-Flow Adaption

An optimized mapping algorithm based on simulated annealing for regular NoC architecture

Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect

HAS-RL: A Hierarchical Approximate Scheme Optimized with Reinforcement Learning for NoC-Based NN Accelerators

A NoC-based simulator for design and evaluation of deep neural networks

Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework

An Efficient Task Mapping Algorithm with Power-Aware Optimization for Network on Chip

Fast-OverlaPIM: A Fast Overlap-driven Mapping Framework for Processing In-Memory Neural Network Acceleration

INDM: Chiplet-Based Interconnect Network and Dataflow Mapping for DNN Accelerators

Efficient Scheduling of Irregular Network Structures on CNN Accelerators

Optimizing Off-Chip Memory Access for Deep Neural Network Accelerator