Abstract:Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives -- including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at <a class="link-external link-https" href="https://decodingtrust.github.io/" rel="external noopener nofollow">this https URL</a> ; our dataset can be previewed at <a class="link-external link-https" href="https://huggingface.co/datasets/AI-Secure/DecodingTrust" rel="external noopener nofollow">this https URL</a> ; a concise version of this work is at <a class="link-external link-https" href="https://openreview.net/pdf?id=kaHpo8OZw2" rel="external noopener nofollow">this https URL</a> .

VFFG: Verifiable Privacy-Enhanced Federated Fine-Tuning for GPT Service

SecureGPT: A Framework for Multi-Party Privacy-Preserving Transformer Inference in GPT

EVFL: Towards Efficient Verifiable Federated Learning Via Parameter Reuse and Adaptive Sparsification

PVFL: Verifiable Federated Learning and Prediction with Privacy-Preserving

I can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic Encryption

Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables

PPFLV: privacy-preserving federated learning with verifiability

Towards Building the Federated GPT: Federated Instruction Tuning

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

FedCG: Leverage Conditional GAN for Protecting Privacy and Maintaining Competitive Performance in Federated Learning

PPT: A Privacy-Preserving Global Model Training Protocol for Federated Learning in P2P Networks

FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices

InferDPT: Privacy-Preserving Inference for Black-box Large Language Model

Byzantine-Robust and Privacy-Preserving Framework for FedML

Federated Foundation Models: Privacy-Preserving and Collaborative Learning for Large Models

PPFed: A Privacy-Preserving and Personalized Federated Learning Framework

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

Efficient and Privacy-Preserving Feature Importance-based Vertical Federated Learning

PVD-FL: A Privacy-Preserving and Verifiable Decentralized Federated Learning Framework

Differentially Private Fine-tuning of Language Models

East: Efficient and Accurate Secure Transformer Framework for Inference