Lifting the Veil on the Large Language Model Supply Chain: Composition, Risks, and Mitigations

Kaifeng Huang,Bihuan Chen,You Lu,Susheng Wu,Dingji Wang,Yiheng Huang,Haowen Jiang,Zhuotong Zhou,Junming Cao,Xin Peng
2024-10-31
Abstract:Large language models (LLM) have sparked significant impact with regard to both intelligence and productivity. In recent years, a great surge has been witnessed in the introduction of both commercial and open-source LLMs. Many businesses have adopted the LLMs into their applications to solve their own domain-specific tasks. However, integrating LLMs into specific business scenarios requires more than just utilizing the models themselves. Instead, it is a systematic process that involves substantial components, which are collectively referred to as the LLM supply chain. The LLM supply chain inherently carries risks. Therefore, it is essential to understand the types of components that may be introduced into the supply chain and the associated risks, enabling different stakeholders to implement effective mitigation measures. While some literature discusses risks associated with LLMs, there is currently no paper that clearly outlines the LLM supply chain from the perspective of both providing and consuming its components. As LLMs have become essential infrastructure in the new era, we believe that a thorough review of the LLM supply chain, along with its inherent risks and mitigation strategies, would be valuable for industry practitioners to avoid potential damages and losses, and enlightening for academic researchers to rethink existing approaches and explore new avenues of research. Our paper provides a comprehensive overview of the LLM supply chain, detailing the stakeholders, composing artifacts, and the supplying types. We developed taxonomies of risk types, risky actions, and mitigations related to various supply chain stakeholders and components. In summary, our work explores the technical and operational aspects of the LLM supply chain, offering valuable insights for researchers and engineers in the evolving LLM landscape.
Software Engineering
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore the composition, risks, and mitigation measures in the supply chain of large language models (LLM). Specifically, the paper focuses on the following aspects: 1. **Composition of the Supply Chain**: - It provides a detailed description of the various components in the LLM supply chain, including data, models, prompts, code, etc. - It explains the different roles within the supply chain, such as contributors, consumers, administrators, etc. 2. **Risks in the Supply Chain**: - It analyzes the various potential risks in the LLM supply chain, including output risks, privacy attacks, prompt attacks, toolchain attacks, etc. - By categorizing and summarizing, it systematically outlines these risk types and their potential impacts. 3. **Risk Mitigation Measures**: - It proposes mitigation measures for different types of risks, such as input purification, output purification, data cleaning, etc. - It provides specific action guidelines for different stakeholders in the supply chain to reduce potential damage and loss. 4. **Comprehensive Overview**: - It offers a comprehensive overview of the LLM supply chain, covering key components, participants, and supply types. - It develops a classification system for risk types, risk behaviors, and mitigation measures, providing valuable references for practitioners and researchers. ### Background and Motivation With the widespread application of large language models in commercial and open-source fields, integrating them into specific business scenarios requires not only the use of the models themselves but also a systematic process involving multiple components and technologies. However, this process brings new risks, such as data leakage, malicious attacks, etc. Currently, although some literature discusses risks related to LLMs, there is a lack of systematic analysis and comprehensive mitigation strategies for the entire supply chain. Therefore, this paper aims to fill this gap, providing guidance for industry practitioners and academic researchers to better understand and manage risks in the LLM supply chain.