Large Language Model Supply Chain: Open Problems From the Security Perspective

Qiang Hu,Xiaofei Xie,Sen Chen,Lei Ma
DOI: https://doi.org/10.48550/arXiv.2411.01604
2024-11-03
Abstract:Large Language Model (LLM) is changing the software development paradigm and has gained huge attention from both academia and industry. Researchers and developers collaboratively explore how to leverage the powerful problem-solving ability of LLMs for specific domain tasks. Due to the wide usage of LLM-based applications, e.g., ChatGPT, multiple works have been proposed to ensure the security of LLM systems. However, a comprehensive understanding of the entire processes of LLM system construction (the LLM supply chain) is crucial but relevant works are limited. More importantly, the security issues hidden in the LLM SC which could highly impact the reliable usage of LLMs are lack of exploration. Existing works mainly focus on assuring the quality of LLM from the model level, security assurance for the entire LLM SC is ignored. In this work, we take the first step to discuss the potential security risks in each component as well as the integration between components of LLM SC. We summarize 12 security-related risks and provide promising guidance to help build safer LLM systems. We hope our work can facilitate the evolution of artificial general intelligence with secure LLM ecosystems.
Cryptography and Security,Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the security risks in the large - language - model (LLM) supply chain. Specifically, although existing research mainly focuses on ensuring the quality and security of the LLM model itself, less attention has been paid to the security of the entire LLM supply chain. The author points out that the LLM supply chain includes multiple components and parties, such as data providers, model developers, third - party libraries, etc., and the dependencies between these components and their potential security risks have not been fully explored. ### Main problems of the paper 1. **Lack of comprehensive understanding of the entire LLM supply chain**: Most of the existing research focuses on the security at the model level, ignoring the security risks that other components in the supply chain (such as data preparation, model training, deployment environment, etc.) may bring. 2. **Security risks in all links of the supply chain**: From data collection to final application deployment, every link may have security risks, such as data selection attacks, data cleaning bypass, attacks on automatic annotation tools, vulnerabilities in frameworks and third - party libraries, exploitation of training techniques, distribution conflicts, etc. 3. **Limitations of existing research**: Most research focuses on specific components (such as ChatGPT) or specific tasks and fails to comprehensively consider the security of the entire supply chain. ### Goals of the paper - **Identify and summarize potential security risks**: By analyzing each link of the LLM supply chain, the author identifies 12 potential security risks and provides detailed descriptions. - **Propose mitigation measures**: In response to these risks, the author proposes corresponding mitigation measures and guiding principles to help researchers and developers build more secure LLM systems. - **Promote the development of more reliable artificial intelligence**: By increasing awareness of the security of the LLM supply chain, promote the development and application of safer and more reliable artificial intelligence systems. ### Main contributions 1. **For the first time, explored the security risks of integrating all components in the LLM supply chain** and summarized 12 related security risks. 2. **Provided promising guidelines** to help mitigate these risks and support the development of more secure LLM systems. 3. **Emphasized the importance of the overall security of the supply chain**, not just focusing on the security of the model itself. Through these efforts, the author hopes that their work can promote the development of a more secure LLM ecosystem, thereby promoting the evolution of general artificial intelligence.