Abstract:Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200×10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72× and a power reduction exceeding 5.28× under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.

High-performance reconfigurable hardware architecture for restricted Boltzmann machines

Ising Model Optimization Problems on a FPGA Accelerated Restricted Boltzmann Machine

An Energy-efficient Multi-core Restricted Boltzmann Machine Processor with On-chip Bio-plausible Learning and Reconfigurable Sparsity.

FPGA implementation of hardware processing modules as coprocessors in brain-machine interfaces.

A Compact and Configurable Long Short-Term Memory Neural Network Hardware Architecture.

Field-programmable gate array implementation of a probabilistic neural network for motor cortical decoding in rats.

Hardware implementation of radial-basis neural networks with Gaussian activation functions on FPGA

An Implementation Method Using Cut-Off Bits for Restricted Boltzmann Machines Without Random Number Generators

Rna: A Reconfigurable Architecture for Hardware Neural Acceleration

Deep neural network accelerator based on FPGA

Hardware-friendly Neural Network Architecture for Neuromorphic Computing

A neuromorphic hardware architecture using the Neural Engineering Framework for pattern recognition

FP-BNN: Binarized neural network on FPGA

Scalable Multi-FPGA HPC Architecture for Associative Memory System

Implementation of a 12-Million Hodgkin-Huxley Neuron Network on a Single Chip

RANC: Reconfigurable Architecture for Neuromorphic Computing

Breaking Liebig's Law: An Advanced Multipurpose Neuromorphic Engine

A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Recurrent Neural Networks Hardware Implementation on FPGA