Abstract:Over the past two decades, deep learning has received tremendous success in developing software systems across various domains. Deep learning frameworks have been proposed to facilitate the development of such software systems, among which, PyTorch and TensorFlow stand out as notable examples. Considerable attention focuses on exploring software engineering practices and addressing diverse technical aspects in developing and deploying deep learning frameworks and software systems. Despite these efforts, little is known about the open-source software communities involved in the development of deep learning frameworks. In this paper, we perform a comparative investigation into the open-source software communities of the two representative deep learning frameworks, PyTorch and TensorFlow. To facilitate the investigation, we compile a dataset of 2,792 and 3,288 code commit authors, along with 9,826 and 19,750 participants engaged in issue events on GitHub, from the two communities, respectively. With the dataset, we first characterize the structures of the two communities by employing four operationalizations to classify contributors into various roles and inspect the contributions made by common contributors across the two communities. We then conduct a longitudinal analysis to characterize the evolution of the two communities across various releases, in terms of the numbers of contributors with various roles and role transitions among contributors. Finally, we explore the causal effects between community characteristics and the popularity of the two frameworks. We find that the TensorFlow community harbors a larger base of contributors, encompassing a higher proportion of core developers and a more extensive cohort of active users compared to the PyTorch community. In terms of the technical background of the developers, 64.4% and 56.1% developers in the PyTorch and TensorFlow communities are employed by the leading companies of the corresponding open-source software projects, Meta and Google, respectively. 25.9% and 21.9% core developers in the PyTorch and TensorFlow communities possess Ph.D. degrees, while 77.2% and 77.7% contribute to other machine learning or deep learning open-source projects, respectively. Developers contributing to both communities demonstrate spatial and temporal similarities to some extent in their pull requests across the respective projects. The evolution of contributors with various roles exhibits a consistent upward trend over time in the PyTorch community. Conversely, a noticeable turning point in the growth of contributors characterizes the evolution of the TensorFlow community. Both communities show a statistically significant decreasing trend in the inflow rates of core developers. Furthermore, we observe statistically significant causal effects between the expansion of communities and retention of core developers and the popularity of deep learning frameworks. Based on our findings, we discuss implications, provide recommendations for sustaining open-source software communities of deep learning frameworks, and outline directions for future research.

An Empirical Study on TensorFlow Program Bugs

Understanding the OSS Communities of Deep Learning Frameworks: A Comparative Case Study of PyTorch and TensorFlow

An Empirical Study on Bugs Inside PyTorch: A Replication Study

Toward Understanding Deep Learning Framework Bugs

An Empirical Study on Tensor Shape Faults in Deep Learning Systems

Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow

Characterizing Common and Domain-Specific Package Bugs: A Case Study on Ubuntu.

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

Gdefects4dl: A Dataset of General Real-World Deep Learning Program Defects

Gdefects4dl

On Reporting Performance and Accuracy Bugs for Deep Learning Frameworks: An Exploratory Study from GitHub

Towards Understanding the Faults of JavaScript-Based Deep Learning Systems

An Empirical Study of Bugs in Machine Learning Systems

Characterizing Performance Bugs in Deep Learning Systems

Understanding Bugs in Multi-Language Deep Learning Frameworks

Demystifying Dependency Bugs in Deep Learning Stack

Taxonomy of Real Faults in Deep Learning Systems

Audee: Automated Testing for Deep Learning Frameworks

Understanding Performance Problems in Deep Learning Systems.

An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications

Detecting Defects in Deep Learning Systems: a Survey