Abstract:Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local information from an individual test sample to determine their exiting indicators, failing to leverage the global information offered by sample population. This leads to suboptimal estimation of prediction correctness, resulting in erroneous exiting decisions. To bridge the gap, we explore the necessity of effectively combining both local and global information to ensure reliable early exiting during inference. Purposefully, we leverage prototypical networks to learn class prototypes and devise a distance metric between samples and class prototypes. This enables us to utilize global information for estimating the correctness of early predictions. On this basis, we propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$^3$-BERT). DE$^3$-BERT implements a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information to enhance the estimation of prediction correctness for more reliable early exiting decisions. Extensive experiments on the GLUE benchmark demonstrate that DE$^3$-BERT consistently outperforms state-of-the-art models under different speed-up ratios with minimal storage or computational overhead, yielding a better trade-off between model performance and inference efficiency. Additionally, an in-depth analysis further validates the generality and interpretability of our method.

Accelerating BERT Inference for Sequence Labeling Via Early-Exit.

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference.

F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks

Accelerating BERT inference with GPU-efficient exit prediction

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models.

DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks

EarlyBERT: Efficient BERT Training Via Early-bird Lottery Tickets

The Right Tool for the Job: Matching Model and Instance Complexities

Accelerating Large Language Model Inference with Self-Supervised Early Exits

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Supplementary Features of BiLSTM for Enhanced Sequence Labeling

SkipBERT: Efficient Inference with Shallow Layer Skipping

Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Early Exiting with Ensemble Internal Classifiers.

EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models

COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models