Abstract:Finetuning on task-specific datasets is a widely-embraced paradigm of harnessing the powerful capability of pretrained LLMs for various downstream tasks. Due to the popularity of LLMs finetuning and its accompanying privacy concerns, differentially private (DP) finetuning of pretrained LLMs has garnered increasing attention to safeguarding the privacy of task-specific datasets. Lying at the design core of DP LLM finetuning methods is the satisfactory tradeoff between privacy, utility, and scalability. Most existing methods build upon the seminal work of DP-SGD. Despite pushing the scalability of DP-SGD to its limit, DP-SGD-based finetuning methods are unfortunately limited by the inherent inefficiency of SGD. In this paper, we investigate the potential of DP zeroth-order methods for LLM pretraining, which avoids the scalability bottleneck of SGD by approximating the gradient with the more efficient zeroth-order gradient. Rather than treating the zeroth-order method as a drop-in replacement for SGD, this paper presents a comprehensive study both theoretically and empirically. First, we propose the stagewise DP zeroth-order method that dynamically schedules key hyperparameters. This design is grounded on the synergy between DP random perturbation and the gradient approximation error of the zeroth-order method, and its effect on finetuning trajectory. Second, we further enhance the scalability by reducing the trainable parameters that are identified by repurposing a data-free pruning technique requiring no additional data or extra privacy budget. We provide theoretical analysis for both proposed methods. We conduct extensive empirical analysis on both encoder-only masked language model and decoder-only autoregressive language model, achieving impressive results in terms of scalability and utility.

Private Fine-tuning of Large Language Models with Zeroth-order Optimization

DPZero: Private Fine-Tuning of Language Models without Backpropagation

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

An Efficient DP-SGD Mechanism for Large Scale NLP Models

Fine-Tuning Large Language Models with User-Level Differential Privacy

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM

Differentially Private Fine-tuning of Language Models

Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners

DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

Large Language Models Can Be Strong Differentially Private Learners

Differentially Private Optimization on Large Model at Small Cost

Zero redundancy distributed learning with differential privacy

Differentially Private Fine-Tuning of Diffusion Models

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Differentially Private Bias-Term Fine-tuning of Foundation Models

Improving Differentially Private SGD via Randomly Sparsified Gradients

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

DP-FP: Differentially Private Forward Propagation for Large Models

DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction