PyGuard: Finding and Understanding Vulnerabilities in Python Virtual Machines

Chengman Jiang,Baojian Hua,Wanrong Ouyang,Qiliang Fan,Zhizhong Pan
DOI: https://doi.org/10.1109/issre52982.2021.00055
2021-01-01
Abstract:Python has become one of the most popular pro-gramming languages in the era of data science and machine learning, and is also widely deployed in safety-critical fields like medical treatment, autonomous driving systems, etc. However, as the official and most widely used Python virtual machine, CPython, is implemented using C language, existing research has shown that the native code in CPython is highly vulnerable, thus defeats Python's guarantee of safety and security. This paper presents the design and implementation of PyGuard, a novel software prototype to find and understand real-world security vulnerabilities in the CPython virtual machines. With PyGuard, we carried out an empirical study of 10 different versions of CPython virtual machines (from version 3.0 to the latest 3.9). By scanning a total of 3,358,391 lines native code, we have identified 598 new vulnerabilities. Based on our study, we describe a taxonomy to classify vulnerabilities in CPython virtual machines. Our taxonomy provides a guidance to construct automated and accurate bug-finding tools. We also suggest systematic remedies that can mediate the threats posed by these vulnerabilities.
What problem does this paper attempt to address?