Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

Kaiwen Tang,Zhanglu Yan,Weng-Fai Wong
2024-09-04
Abstract:For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models (SLMs) targeted for deployment in resource-constrained devices where energy efficiency is a significant concern. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a power normalization method using bit-shifting (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while significantly reducing energy consumption. We validate Sorbet's effectiveness through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference.
Neural and Evolutionary Computing,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the challenges of energy efficiency and hardware compatibility when deploying language models on resource-constrained devices. Specifically, the authors propose a new model called Sorbet, which is a spiking neural network (SNN) language model based on the transformer architecture. Sorbet addresses the issues of softmax and layer normalization (LN) operations, which are difficult to implement on neuromorphic hardware in traditional transformer models, by introducing new Positional-Temporal softmax (PTsoftmax) and Bias-Shifted Power Normalization (BSPN). Additionally, through knowledge distillation and model quantization techniques, Sorbet is able to maintain competitive performance while significantly reducing energy consumption. Through extensive testing on the GLUE benchmark and a series of ablation studies, Sorbet demonstrates its potential as an efficient language model inference solution. Overall, the main contribution of the paper lies in exploring transformer model operators suitable for neuromorphic hardware and proposing two plug-in alternatives that allow SNNs to operate without relying on complex functions. Furthermore, Sorbet achieves extreme compression of the model with binary weights, further reducing model size and computational cost.