Abstract:Over the past two decades, deep learning has received tremendous success in developing software systems across various domains. Deep learning frameworks have been proposed to facilitate the development of such software systems, among which, PyTorch and TensorFlow stand out as notable examples. Considerable attention focuses on exploring software engineering practices and addressing diverse technical aspects in developing and deploying deep learning frameworks and software systems. Despite these efforts, little is known about the open-source software communities involved in the development of deep learning frameworks. In this paper, we perform a comparative investigation into the open-source software communities of the two representative deep learning frameworks, PyTorch and TensorFlow. To facilitate the investigation, we compile a dataset of 2,792 and 3,288 code commit authors, along with 9,826 and 19,750 participants engaged in issue events on GitHub, from the two communities, respectively. With the dataset, we first characterize the structures of the two communities by employing four operationalizations to classify contributors into various roles and inspect the contributions made by common contributors across the two communities. We then conduct a longitudinal analysis to characterize the evolution of the two communities across various releases, in terms of the numbers of contributors with various roles and role transitions among contributors. Finally, we explore the causal effects between community characteristics and the popularity of the two frameworks. We find that the TensorFlow community harbors a larger base of contributors, encompassing a higher proportion of core developers and a more extensive cohort of active users compared to the PyTorch community. In terms of the technical background of the developers, 64.4% and 56.1% developers in the PyTorch and TensorFlow communities are employed by the leading companies of the corresponding open-source software projects, Meta and Google, respectively. 25.9% and 21.9% core developers in the PyTorch and TensorFlow communities possess Ph.D. degrees, while 77.2% and 77.7% contribute to other machine learning or deep learning open-source projects, respectively. Developers contributing to both communities demonstrate spatial and temporal similarities to some extent in their pull requests across the respective projects. The evolution of contributors with various roles exhibits a consistent upward trend over time in the PyTorch community. Conversely, a noticeable turning point in the growth of contributors characterizes the evolution of the TensorFlow community. Both communities show a statistically significant decreasing trend in the inflow rates of core developers. Furthermore, we observe statistically significant causal effects between the expansion of communities and retention of core developers and the popularity of deep learning frameworks. Based on our findings, we discuss implications, provide recommendations for sustaining open-source software communities of deep learning frameworks, and outline directions for future research.

An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks

Understanding the OSS Communities of Deep Learning Frameworks: A Comparative Case Study of PyTorch and TensorFlow

An Empirical Study of the Dependency Networks of Deep Learning Libraries

Is using deep learning frameworks free?: characterizing technical debt in deep learning frameworks

An Orchestrated Empirical Study on Deep Learning Frameworks and Platforms

Accurate Library Recommendation Using Combining Collaborative Filtering and Topic Model for Mobile Development.

What Do Programmers Discuss about Deep Learning Frameworks

An Exploratory Study on the Introduction and Removal of Different Types of Technical Debt in Deep Learning Frameworks

An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms

A detailed comparative study of open source deep learning frameworks

A Comprehensive Deep Learning Library Benchmark and Optimal Library Selection

DLBench: a comprehensive experimental evaluation of deep learning frameworks

DLBench: An Experimental Evaluation of Deep Learning Frameworks

An Overview Of Open Source Deep Learning-Based Libraries For Neuroscience

A comprehensive study on challenges in deploying deep learning based software

Demystifying Dependency Bugs in Deep Learning Stack

Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement

Why Do Deep Learning Projects Differ in Compatible Framework Versionsƒ an Exploratory Study

A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices

An Exploratory Study of Deep Learning Supply Chain

Sustainability Forecasting for Deep Learning Packages