Is Your AI Truly Yours? Leveraging Blockchain for Copyrights, Provenance, and Lineage

Yilin Sai,Qin Wang,Guangsheng Yu,H.M.N. Dilum Bandara,Shiping Chen
2024-04-09
Abstract:As Artificial Intelligence (AI) integrates into diverse areas, particularly in content generation, ensuring rightful ownership and ethical use becomes paramount. AI service providers are expected to prioritize responsibly sourcing training data and obtaining licenses from data owners. However, existing studies primarily center on safeguarding static copyrights, which simply treats metadata/datasets as non-fungible items with transferable/trading capabilities, neglecting the dynamic nature of training procedures that can shape an ongoing trajectory.
Cryptography and Security,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of data and model copyright management, provenance, and lineage involved in the training process of artificial intelligence (AI) models. Specifically, with the wide application of large - language models (LLMs) and other AI technologies, it is crucial to ensure the legality of data sources and copyright compliance. However, existing research mainly focuses on static copyright protection, ignoring the dynamic nature in the AI model training process. #### Main problems: 1. **Provenance and lineage of data and models**: Existing methods are difficult to track the data sources used in the AI model training process and their evolution history. 2. **Copyright compliance**: AI service providers need to ensure that when using third - party data for model training, they comply with copyright laws and obtain the necessary authorizations. 3. **Transparency and trust**: When the entire training process occurs locally or in black - box cloud services, users lack transparency and cannot verify the authenticity and legality of data and models. 4. **Flexible license management**: Existing systems are difficult to support continuous model retraining and fine - tuning while maintaining the traceability of data sources and copyright compliance. To solve these problems, the author proposes a blockchain - based framework IBIS (Intelligent Blockchain - based Integrated System) to achieve the following goals: - **Seamless integration**: IBIS can be seamlessly integrated with the existing AI model training process and support iterative model retraining and fine - tuning. - **Adaptability**: IBIS supports continuous model retraining and license updates to ensure the compliance of the model throughout its life cycle. - **Traceability**: By deploying an immutable on - chain registry, IBIS maintains the relationship records of data sets and models to ensure provenance. - **Multi - party signature**: Using the identity management and digital signature functions of the private permissioned blockchain, IBIS realizes an efficient and secure multi - party signature workflow. - **Controllability**: By implementing an on - chain access control mechanism, IBIS ensures that only authorized parties can access training data sets, models, and license information, thereby protecting commercial sensitivity. Through these features, IBIS aims to provide a comprehensive solution for the AI industry to ensure the transparency of data and model sources, copyright compliance, and support flexible license management.