Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang,Zhanglu Yan,Weng-Fai Wong

2024-09-04

Abstract:For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models (SLMs) targeted for deployment in resource-constrained devices where energy efficiency is a significant concern. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a power normalization method using bit-shifting (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while significantly reducing energy consumption. We validate Sorbet's effectiveness through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference.

Neural and Evolutionary Computing,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the challenges of energy efficiency and hardware compatibility when deploying language models on resource-constrained devices. Specifically, the authors propose a new model called Sorbet, which is a spiking neural network (SNN) language model based on the transformer architecture. Sorbet addresses the issues of softmax and layer normalization (LN) operations, which are difficult to implement on neuromorphic hardware in traditional transformer models, by introducing new Positional-Temporal softmax (PTsoftmax) and Bias-Shifted Power Normalization (BSPN). Additionally, through knowledge distillation and model quantization techniques, Sorbet is able to maintain competitive performance while significantly reducing energy consumption. Through extensive testing on the GLUE benchmark and a series of ablation studies, Sorbet demonstrates its potential as an efficient language model inference solution. Overall, the main contribution of the paper lies in exploring transformer model operators suitable for neuromorphic hardware and proposing two plug-in alternatives that allow SNNs to operate without relying on complex functions. Furthermore, Sorbet achieves extreme compression of the model with binary weights, further reducing model size and computational cost.

Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

SpikingMiniLM: Energy-Efficient Spiking Transformer for Natural Language Understanding

A Sparsity-Adapted Hardware Implementation of SNN for Cortical Spike Trains Decoding

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Language Modeling on a SpiNNaker 2 Neuromorphic Chip

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

Exploring Extreme Quantization in Spiking Language Models

BrainTransformers: SNN-LLM

Compiling Spiking Neural Networks to Neuromorphic Hardware

SNNLP: Energy-Efficient Natural Language Processing Using Spiking Neural Networks

A Scatter-and-Gather Spiking Convolutional Neural Network on a Reconfigurable Neuromorphic Hardware

FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture

You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

sBSNN: Stochastic-Bits Enabled Binary Spiking Neural Network with On-Chip Learning for Energy Efficient Neuromorphic Computing at the Edge

Cerebron: A Reconfigurable Architecture for Spatiotemporal Sparse Spiking Neural Networks

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Efficient Implementation of Spiking Neural Networks for Inference Using Ex-Situ Training

Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM

Synaptic Activity and Hardware Footprint of Spiking Neural Networks in Digital Neuromorphic Systems