Abstract:The growing complexity and diversity of neural networks in the fields of autonomous driving and intelligent robots have facilitated the research of many-core architectures, which can offer sufficient programming flexibility to simultaneously support multi-DNN parallel inference with different network structures and sizes compared to domain-specific architectures. However, due to the tight constraints of area and power consumption, many-core architectures typically use lightweight scalar cores without vector units and are almost unable to meet the high-performance computing needs of multi-DNN parallel inference. To solve the above problem, we design an area- and energy-efficient many-core architecture by integrating large amounts of lightweight processor cores with RV32IMA ISA. The architecture leverages the emerging SRAM-based computing-in-memory technology to implement vector instruction extensions by reusing memory cells in the data cache instead of conventional logic circuits. Thus, the data cache in each core can be reconfigured as the memory part and the computing part with the latter tightly coupled with the core pipeline, enabling parallel execution of the basic RISC-V instructions and the extended multi-cycle vector instructions. Furthermore, a corresponding execution framework is proposed to effectively map DNN models onto the many-core architecture by using intra-layer and inter-layer pipelining, which potentially supports multi-DNN parallel inference. Experimental results show that the proposed MAICC architecture obtains a 4.3 × throughput and 31.6 × energy efficiency over CPU (Intel i9-13900k). MAICC also achieves a 1.8 × energy efficiency over GPU (RTX 4090) with only 4MB on-chip memory and 28 mm2 area.

Icache: an Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training.

Adaptive Cache Management for Complex Storage Systems Using CNN-LSTM-Based Spatiotemporal Prediction

An Enhanced Data Cache with In-Cache Processing Units for Convolutional Neural Network Accelerators

High Throughput CNN Inference and Training with In-Cache Computation

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Hoard: A Distributed Data Caching System to Accelerate Deep Learning Training on the Cloud

A Simple Cache Model for Image Recognition

DeepCache: Accelerating Diffusion Models for Free

Fleche: an efficient GPU embedding cache for personalized recommendations

JeCache: Just-Enough Data Caching with Just-in-Time Prefetching for Big Data Applications.

An Online Approach for DNN Model Caching and Processor Allocation in Edge Computing

DeepCache: Principled Cache for Mobile Deep Vision.

Accelerating Convolutional Neural Networks for Continuous Mobile Vision Via Cache Reuse.

Adaptive Online Cache Capacity Optimization via Lightweight Working Set Size Estimation at Scale.

Caching as an Image Characterization Problem using Deep Convolutional Neural Networks

Mixed-Precision Embedding Using a Cache

MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel Inference.

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

A Deep Learning Dataloader with Shared Data Preparation

Improving In-Memory File System Reading Performance by Fine-Grained User-Space Cache Mechanisms

A Deep Reinforcement Learning Approach for Dynamic Contents Caching in HetNets