Abstract:Read mapping, which maps billions of reads to a reference DNA, poses a significant performance bottleneck in genomic analysis. Current accelerators for read mapping are primarily bounded by the intensive and random memory access to huge datasets. Near-data processing (NDP) infrastructures are promising to provide extremely high bandwidth. However, existing frameworks failed to reach this potential due to poor locality and high redundancy. Our idea is to introduce prediction under the insight that candidate mapping positions become predictable when the reference is organized in coarse-grain slices. We present GEM ( Ge nomic M emory), an ultra-efficient near-memory accelerator for read mapping. GEM adopts a novel data-centric framework, named dividing-and-predictive-scattering (DPS), which synthesizes information of seed existence to predict the target mapping locations to reduce memory access redundancy. During preparation, DPS divides the reference into coarse-grained slices and creates predictive filters to assess the likelihood of reads belonging to each slice. During mapping, DPS predicts and scatters reads to considerably fewer slices compared than without prediction. By employing small on-chip SRAM-based predictors with high accuracy, DPS minimizes unnecessary DRAM access and data movement from remote memory. In essence, DPS trades pre-seeding predictors for localized access patterns and low redundancy, hence achieving high throughput for data-intensive applications. We implement GEM by integrating coarse-grain reconfigurable architectures (CGRAs) in the logic layer of a 3D-stacked DRAM infrastructure, utilizing the massive banks as slices. GEM leverages CGRAs for their flexibility in supporting various algorithms tailored to different datasets. Bloom filters are leveraged for slice prediction, providing an error rate below 1%. Evaluation results demonstrate that GEM reduces memory requests by 95% and alignments by 87%, achieving a throughput improvement of 15.3× and 11.0× compared to compute-centric and broadcast-based baselines on the same NDP platform. Overall, GEM achieves a $3.5\times$ throughput improvement and $2.1\times$ energy efficiency compared to state-of-the-art ASIC accelerators.

MeNDA: A Near-Memory Multi-way Merge Solution for Sparse Transposition and Dataflows

A design framework for processing-in-memory accelerator

Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV

BafSP: Co-Design of Compute SRAM and Bit-Aware Data Flip Mitigation with In-Memory Sparsity Detection for SpMM

Near Data Acceleration with Concurrent Host Access

GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering

A sparse matrix vector multiplication accelerator based on high-bandwidth memory

Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow.

Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging

SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator

Stream-Based Data Placement for Near-Data Processing with Extended Memory

Practical Near-Data Processing for In-Memory Analytics Frameworks

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning

G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

GAS: General-Purpose In-Memory-Computing Accelerator for Sparse Matrix Multiplication

ABNDP: Co-optimizing Data Access and Load Balance in Near-Data Processing

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

HAIMA: A Hybrid SRAM and DRAM Accelerator-in-Memory Architecture for Transformer

MeMPA: A Memory Mapped M-SIMD Co-Processor to Cope with the Memory Wall Issue

HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural Computer

Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-Interactive Architecture