Understanding the OSS Communities of Deep Learning Frameworks: A Comparative Case Study of PyTorch and TensorFlow

Yunqi Chen,Zhiyuan Wan,Yifei Zhuang,Ning Liu,David Lo,Xiaohu Yang
DOI: https://doi.org/10.1145/3705303
IF: 3.685
2024-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:Over the past two decades, deep learning has received tremendous success in developing software systems across various domains. Deep learning frameworks have been proposed to facilitate the development of such software systems, among which, PyTorch and TensorFlow stand out as notable examples. Considerable attention focuses on exploring software engineering practices and addressing diverse technical aspects in developing and deploying deep learning frameworks and software systems. Despite these efforts, little is known about the open-source software communities involved in the development of deep learning frameworks. In this paper, we perform a comparative investigation into the open-source software communities of the two representative deep learning frameworks, PyTorch and TensorFlow. To facilitate the investigation, we compile a dataset of 2,792 and 3,288 code commit authors, along with 9,826 and 19,750 participants engaged in issue events on GitHub, from the two communities, respectively. With the dataset, we first characterize the structures of the two communities by employing four operationalizations to classify contributors into various roles and inspect the contributions made by common contributors across the two communities. We then conduct a longitudinal analysis to characterize the evolution of the two communities across various releases, in terms of the numbers of contributors with various roles and role transitions among contributors. Finally, we explore the causal effects between community characteristics and the popularity of the two frameworks. We find that the TensorFlow community harbors a larger base of contributors, encompassing a higher proportion of core developers and a more extensive cohort of active users compared to the PyTorch community. In terms of the technical background of the developers, 64.4% and 56.1% developers in the PyTorch and TensorFlow communities are employed by the leading companies of the corresponding open-source software projects, Meta and Google, respectively. 25.9% and 21.9% core developers in the PyTorch and TensorFlow communities possess Ph.D. degrees, while 77.2% and 77.7% contribute to other machine learning or deep learning open-source projects, respectively. Developers contributing to both communities demonstrate spatial and temporal similarities to some extent in their pull requests across the respective projects. The evolution of contributors with various roles exhibits a consistent upward trend over time in the PyTorch community. Conversely, a noticeable turning point in the growth of contributors characterizes the evolution of the TensorFlow community. Both communities show a statistically significant decreasing trend in the inflow rates of core developers. Furthermore, we observe statistically significant causal effects between the expansion of communities and retention of core developers and the popularity of deep learning frameworks. Based on our findings, we discuss implications, provide recommendations for sustaining open-source software communities of deep learning frameworks, and outline directions for future research.
What problem does this paper attempt to address?