Mining Pull Requests to Detect Process Anomalies in Open Source Software Development

Bohan Liu,He Zhang,Weigang Ma,Hongyu Kuang,Yi Yang,Jinwei Xu,Shan Gao,Jian Gao
DOI: https://doi.org/10.1145/3597503.3639196
2024-01-01
Abstract:Trustworthy Open Source Software (OSS) development processes are the basis that secures the long-term trustworthiness of soft-ware projects and products. With the aim to investigate the trust-worthiness of the Pull Request (PR) process, the common model of collaborative development in OSS community, we exploit process mining to identify and analyze the normal and anomalous patterns of PR processes, and propose our approach to identifying anomalies from both control-flow and semantic aspects, and then to analyze and synthesize the root causes of the identified anomalies. We analyze 17531 PRs of 18 OSS projects on GitHub, extracting 26 root causes of control-flow anomalies and 19 root causes of semantic anomalies. We find that most PRs can hardly contain both semantic anomalies and control-flow anomalies, and the internal custom rules in projects may be the key causes for the identified anomalous PRs. We further discover and analyze the patterns of normal PR processes. We find that PRs in the non-fork model (42%) are far more likely than the fork model (5%) to bypass the review process, indicating a higher potential risk. Besides, we analyzed nine poisoned projects whose PR practices were indeed worse. Given the complex and diverse PR processes in OSS community, the proposed approach can help identify and understand not only anomalous PRs but also normal PRs, which offers early risk indications of suspicious incidents (such as poisoning) to OSS supply chain.
What problem does this paper attempt to address?