SpiderScan: Practical Detection of Malicious NPM Packages Based on Graph-Based Behavior Modeling and Matching
Yiheng Huang,Ruisi Wang,Wen Zheng,Zhuotong Zhou,Susheng Wu,Shulin Ke,Bihuan Chen,Shan Gao,Xin Peng
DOI: https://doi.org/10.1145/3691620.3695492
2024-01-01
Abstract:Open source software (OSS) supply chains have been attractive targets for attacks. One of the significant, popular attacks is realized by malicious packages on package registries. NPM, as the largest package registry, has been recently flooded with malicious packages. In response to this severe security risk, many detection tools have been proposed. However, these tools do not model malicious behavior in a holistic way; only consider a predefined set of sensitive APIs; and require huge manual confirmation effort due to high false positives and binary detection results. Thus, their practical usefulness is hindered. To address these limitations, we propose a practical tool, named SpiderScan, to identify malicious NPM packages based on graph-based behavior modeling and matching. In the offline phase, given a set of malicious packages, SpiderScan models each malicious behavior in a graph that considers control flows and data dependencies across sensitive API calls, while leveraging LLM to recognize sensitive APIs in both built-in modules and third-party dependencies. In the online phase, given a target package, SpiderScan constructs its suspicious behavior graphs and matches them with malicious behavior graphs, and uses dynamic analysis and LLM to confirm the maliciousness only for certain malicious behaviors. Our extensive evaluation has demonstrated the effectiveness of SpiderScan over the state-of-the-art. SpiderScan has detected 249 new malicious packages in NPM, and received 70 thank letters from the official team of NPM.