Abstract:It is important that consumers and regulators can verify the provenance of large neural models to evaluate their capabilities and risks. We introduce the concept of a "Proof-of-Training-Data": any protocol that allows a model trainer to convince a Verifier of the training data that produced a set of model weights. Such protocols could verify the amount and kind of data and compute used to train the model, including whether it was trained on specific harmful or beneficial data sources. We explore efficient verification strategies for Proof-of-Training-Data that are compatible with most current large-model training procedures. These include a method for the model-trainer to verifiably pre-commit to a random seed used in training, and a method that exploits models' tendency to temporarily overfit to training data in order to detect whether a given data-point was included in training. We show experimentally that our verification procedures can catch a wide variety of attacks, including all known attacks from the Proof-of-Learning literature.

What problem does this paper attempt to address?

The paper attempts to address the issue of how to verify the sources and authenticity of training data for large machine learning models. Currently, users and regulatory bodies rely on trust and reputation to confirm the authenticity of the training data used by AI model providers. However, as the ability to build new AI models becomes more widespread, users need to trust an increasing number of model providers, and regulatory bodies may face malicious developers who might lie to appear compliant with standards and regulations. Worse still, countries developing AI systems of military importance may not trust each other's claims about the capabilities of these systems, making it difficult to reach consensus on limitations. The author introduces the concept of "Proof-of-Training-Data" (PoTD), a protocol that allows model trainers to prove to verifiers the specific training data used to generate a set of model weights. This protocol can verify the amount, type, and computational resources of the data used during training, including whether specific harmful or beneficial data sources were used. The paper explores effective verification strategies compatible with most current large-scale model training processes, including a method for model trainers to verifiably pre-commit to the random seed used in training and a method to detect whether a given data point was included in the training by leveraging the model's temporary overfitting to the training data. Through experiments, the author demonstrates that their verification procedure can capture various attacks, including all known attacks in the Proof-of-Learning literature. In summary, the paper aims to enhance trust in training data by providing a set of verification tools, thereby ensuring the authenticity and reliability of AI models.

Tools for Verifying Neural Models' Training Data

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

Set-Based Training for Neural Network Verification

Shared Certificates for Neural Network Verification

Provenance of Training Without Training Data: Towards Privacy-Preserving DNN Model Ownership Verification

What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring

VPN: Verification of Poisoning in Neural Networks

Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Towards Understanding and Enhancing Security of Proof-of-Training for DNN Model Ownership Verification

Neural Network Verification with Proof Production

A Verifiable and Privacy-Preserving Federated Learning Training Framework

ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

Proof-of-Learning: Definitions and Practice

Verification of deep probabilistic models

Quantitative Verification with Neural Networks

Online Verification of Deep Neural Networks under Domain Shift or Network Updates

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models

Generate and Verify: Semantically Meaningful Formal Analysis of Neural Network Perception Systems

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Towards a Re-evaluation of Data Forging Attacks in Practice

VerIDeep: Verifying Integrity of Deep Neural Networks through Sensitive-Sample Fingerprinting