Abstract:Over the past two decades, deep learning has received tremendous success in developing software systems across various domains. Deep learning frameworks have been proposed to facilitate the development of such software systems, among which, PyTorch and TensorFlow stand out as notable examples. Considerable attention focuses on exploring software engineering practices and addressing diverse technical aspects in developing and deploying deep learning frameworks and software systems. Despite these efforts, little is known about the open-source software communities involved in the development of deep learning frameworks. In this paper, we perform a comparative investigation into the open-source software communities of the two representative deep learning frameworks, PyTorch and TensorFlow. To facilitate the investigation, we compile a dataset of 2,792 and 3,288 code commit authors, along with 9,826 and 19,750 participants engaged in issue events on GitHub, from the two communities, respectively. With the dataset, we first characterize the structures of the two communities by employing four operationalizations to classify contributors into various roles and inspect the contributions made by common contributors across the two communities. We then conduct a longitudinal analysis to characterize the evolution of the two communities across various releases, in terms of the numbers of contributors with various roles and role transitions among contributors. Finally, we explore the causal effects between community characteristics and the popularity of the two frameworks. We find that the TensorFlow community harbors a larger base of contributors, encompassing a higher proportion of core developers and a more extensive cohort of active users compared to the PyTorch community. In terms of the technical background of the developers, 64.4% and 56.1% developers in the PyTorch and TensorFlow communities are employed by the leading companies of the corresponding open-source software projects, Meta and Google, respectively. 25.9% and 21.9% core developers in the PyTorch and TensorFlow communities possess Ph.D. degrees, while 77.2% and 77.7% contribute to other machine learning or deep learning open-source projects, respectively. Developers contributing to both communities demonstrate spatial and temporal similarities to some extent in their pull requests across the respective projects. The evolution of contributors with various roles exhibits a consistent upward trend over time in the PyTorch community. Conversely, a noticeable turning point in the growth of contributors characterizes the evolution of the TensorFlow community. Both communities show a statistically significant decreasing trend in the inflow rates of core developers. Furthermore, we observe statistically significant causal effects between the expansion of communities and retention of core developers and the popularity of deep learning frameworks. Based on our findings, we discuss implications, provide recommendations for sustaining open-source software communities of deep learning frameworks, and outline directions for future research.

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Understanding the OSS Communities of Deep Learning Frameworks: A Comparative Case Study of PyTorch and TensorFlow

Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python

AutoGraph: Imperative-style Coding with Graph-based Performance

TorchBench: Benchmarking PyTorch with High API Surface Coverage

Scorch: A Library for Sparse Deep Learning

TorchOpt: an Efficient Library for Differentiable Optimization.

PyTorch Tabular: A Framework for Deep Learning with Tabular Data

Acceleration of Non-Linear Minimisation with PyTorch

A Data-Centric Optimization Framework for Machine Learning

Performance Evaluation of MindSpore and PyTorch Based on Ascend NPU

OpTorch: Optimized deep learning architectures for resource limited environments

TorchRL: A data-driven decision-making library for PyTorch

depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

TorchCP: A Python Library for Conformal Prediction

What Do Programmers Discuss about Deep Learning Frameworks

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research

PyTorch Metric Learning