Abstract:<p>The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters.</p>

NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

DaDianNao: A Machine-Learning Supercomputer

An Efficient Algorithm for Mapping Deep Learning Applications on the NoC Architecture

A NoC-based simulator for design and evaluation of deep neural networks

Mapping Very Large Scale Spiking Neuron Network to Neuromorphic Hardware.

A Hybrid Heterogeneous Neural Network Accelerator Based on Systolic Array

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package

A Low Power and Low Latency FPGA-Based Spiking Neural Network Accelerator

An Ultra-Low Latency Multicast Router for Large-Scale Multi-Chip Neuromorphic Processing

Low Cost Interconnected Architecture for the Hardware Spiking Neural Networks

Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

A NoC-Based Spatial DNN Inference Accelerator with Memory-Friendly Dataflow

URMP: using reconfigurable multicast path for NoC-based deep neural network accelerators

PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge

A Small-Footprint Accelerator for Large-Scale Neural Networks

An Efficient and Low-Overhead Chip-to-Chip Interconnect Protocol Design for NOC

A Data-Driven Asynchronous Neural Network Accelerator

A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training

A Low-Cost and High-Throughput NoC-Aware Chip-to-Chip Interconnection